Spark Access Control in Qubole

Achuth Rajagopal and Shridhar Ramachandran show off the Spark Data Access Control Framework on Qubole’s platform:

With these requirements in mind, we decided to implement Hive Authorization as our first Policy Manager. Hive Authorization policies are stored in the Qubole Metastore which acts as a shared central component and stores metadata related to Hive Resources like Hive Tables. We enhanced Spark to honor the policies stored in the Qubole Metastore while accessing Hive Tables or for adding and modifying those policies.

In summary, we implemented a SQL standard access control layer identical to what is present in Apache Hive or Presto today. The following sections detail the architecture and provide an example that illustrates how it works.

Click through to learn more.

Related Posts

Comparing Performance: HBase1 vs HBase2

Surbhi Kochhar takes us through performance improvements between HBase version 1 and HBase version 2: We are loading the YCSB dataset with 1000,000,000 records with each record 1KB in size, creating total 1TB of data. After loading, we wait for all compaction operations to finish before starting workload test. Each workload tested was run 3 […]

Read More

The Transaction Log in Delta Tables

Burak Yavuz, et al, explain how the transaction log works with Delta Tables in Apache Spark: When a user creates a Delta Lake table, that table’s transaction log is automatically created in the _delta_log subdirectory. As he or she makes changes to that table, those changes are recorded as ordered, atomic commits in the transaction log. Each commit […]

Read More

Categories

July 2019
MTWTFSS
« Jun Aug »
1234567
891011121314
15161718192021
22232425262728
293031