Hortonworks Data Platform 3.0 Released

Kevin Feasel

2018-07-17

Hadoop

Saumitra Buragohain, et al, announce the newest version of the Hortonworks Data Platform:

Highlighted Apache Hive features include:

  • Workload management for LLAP:  You can assign resource pools within LLAP pool and allocate resources on a per user or per group basis. This enables support for large multi-tenant deployments.

  • ACID v2 and ACID on by default:  We are releasing ACID v2. With the performance improvements in both storage format and execution engine we are seeing equal or better performance when comparing to non-ACID tables. Thus we are turning ACID on by default and enable full support for data updates.

  • Hive Warehouse Connector for Spark:  Hive Warehouse Connector allows you to connect Spark application with Hive data warehouses. The connector automatically handles ACID tables. This enables data science workloads to work well with data in Hive.

  • Materialized view navigation:  Materialized view allows you to pre-aggregate and pre-compute tables used in queries. Typically works best on sub-queries or intermediate tables. The cost based optimizer will automatically plan a query if those intermediate results are available, drastically speed up your queries.

  • Information schema:  Hive now exposes the metadata of the database (tables, columns etc.) via Hive SQL interface directly.

  • JDBC storage connector:  You can now map any JDBC databases into Hive’s catalog. This means you can join data across Hive and other databases using Hive query engine

This looks pretty good.  So of course I learn about it two days after I rebuild my demo Hadoop cluster.

Related Posts

Working With The Databricks API Via Powershell

Gerhard Brueckl has a Powershell module for interacting with Databricks, either Azure or AWS: As most of our deployments use PowerShell I wrote some cmdlets to easily work with the Databricks API in my scripts. These included managing clusters (create, start, stop, …), deploying content/notebooks, adding secrets, executing jobs/notebooks, etc. After some time I ended […]

Read More

Kafka Connect Converters And Serialization

Robin Moffatt goes into great detail on Apache Kafka Connect converters and serialization techniques: Kafka Connect is modular in nature, providing a very powerful way of handling integration requirements. Some key components include: Connectors – the JAR files that define how to integrate with the data store itself Converters – handling serialization and deserialization of […]

Read More

Categories

July 2018
MTWTFSS
« Jun Aug »
 1
2345678
9101112131415
16171819202122
23242526272829
3031