HDP 3.0 Released

Roni Fontaine and Saumitra Buragohain announce Hortonworks Data Platform version 3.0:

Other additional capabilities include:

  • Scalability and availability with NameNode federation, allowing customers to scale to thousands of nodes and a billion files. Higher availability with multiple name nodes and standby capabilities allow for the undisrupted, continuous cluster operations if a namenode goes down.

  • Lower total cost of ownership with erasure coding, providing a data protection method that up to this point has mostly been found in object stores. Hadoop 3 will no longer default to storing three full copies of each piece of data across its clusters. Instead of that 3x hit on storage, the erasure encoding method in Hadoop 3 will incur an overhead of 1.5x while maintaining the same level of data recoverability from disk failure. The end result will be a 50% savings in storage overhead, reducing it by half.

  • Real-time database, delivering improved query optimization to process more data at a faster rate by eliminating the performance gap between low-latency and high-throughput workloads. Enabled via Apache Hive 3.0, HDP 3.0 offers the only unified SQL solution that can seamlessly combine real-time & historical data, making both available for deep SQL analytics. New features such as                workload management enable fine grained resource allocation so no need to worry about resource competition. Materialized views pre-computes and caches the intermediate tables into views where the query optimizer will automatically leverage the pre-computed cache, drastically improve performance. The end result is faster time to insights.

  • Data science performance improvements around Apache Spark and Apache Hive integration. HDP 3.0 provides seamless Spark integration to the cloud. And containerized TensorFlow technical preview combined with GPU pooling delivers a deep learning framework that makes deep learning faster and easier.

Looks like it’s invite-only at the moment, but that should change pretty soon.  It also looks like I’ve got a new weekend project…

Related Posts

Overriding Spark Dependencies

Landon Robinson shows how to override a Spark dependency located on the classpath: This doesn’t draw the line exactly where the method changed from private to public, but generally speaking:– gson-2.2.4.jar: the method is private, and therefore too old for use here– gson-2.6.1: the method is public, and works fine.– Somewhere between the two, the […]

Read More

Kafka and MirrorMaker

Renu Tewari describes what MirrorMaker does for Kafka today and what is coming with version 2: Apache Kafka has become an essential component of enterprise data pipelines and is used for tracking clickstream event data, collecting logs, gathering metrics, and being the enterprise data bus in a microservices based architectures. Kafka is essentially a highly […]

Read More

Categories

June 2018
MTWTFSS
« May Jul »
 123
45678910
11121314151617
18192021222324
252627282930