Saumitra Buragohain, et al, announce the newest version of the Hortonworks Data Platform:
Highlighted Apache Hive features include:
-
Workload management for LLAP: You can assign resource pools within LLAP pool and allocate resources on a per user or per group basis. This enables support for large multi-tenant deployments.
-
ACID v2 and ACID on by default: We are releasing ACID v2. With the performance improvements in both storage format and execution engine we are seeing equal or better performance when comparing to non-ACID tables. Thus we are turning ACID on by default and enable full support for data updates.
-
Hive Warehouse Connector for Spark: Hive Warehouse Connector allows you to connect Spark application with Hive data warehouses. The connector automatically handles ACID tables. This enables data science workloads to work well with data in Hive.
-
Materialized view navigation: Materialized view allows you to pre-aggregate and pre-compute tables used in queries. Typically works best on sub-queries or intermediate tables. The cost based optimizer will automatically plan a query if those intermediate results are available, drastically speed up your queries.
-
Information schema: Hive now exposes the metadata of the database (tables, columns etc.) via Hive SQL interface directly.
-
JDBC storage connector: You can now map any JDBC databases into Hive’s catalog. This means you can join data across Hive and other databases using Hive query engine
This looks pretty good. So of course I learn about it two days after I rebuild my demo Hadoop cluster.