TPC-DS Testing With HDP 3.0

Nita Dembla and Gopal Vijayaraghavan compare HDP 3.0 versus HDP 2.6.5 when running the TPC-DS query set and note performance improvements in Hive LLAP:

Hortonworks announced the general availability of HDP 3.0 this year. You may read more about it here. Bundled with HDP 3.0, Apache Hive 3 with LLAP took a significant leap as a Enterprise Ready Real time Database Warehouse with transactional capabilities that continues to serve BI workloads with lower latencies. HDP 3.0 comes with exciting new capabilities – ACID support, materialized views, SQL constraints and Query result cache to name a few.  Additionally, we continued to build and improve on the performance enhancements introduced in earlier releases.
In this blog, we will provide an update on our performance benchmark blog, comparing performance of HDP 3.0 to HDP 2.6.5. The noteworthy difference in benchmark is that all tables are by default transactional and written in ACID format, which means there are additional metadata (ROW_ID) columns to uniquely identify each row and support transactional semantics. Another key database capability used and tested here is SQL constraints. The hive-testbench schema has been enhanced to declare Primary-Foreign key, not null and unique constraints.

Their headline is that Hive 3 is up to 2x faster than Hive 2, with huge gains in a few of the queries.

Related Posts

Notebooks in Azure Databricks

Brad Llewellyn takes us through Azure Databricks notebooks: Azure Databricks Notebooks support four programming languages, Python, Scala, SQL and R.  However, selecting a language in this drop-down doesn’t limit us to only using that language.  Instead, it makes the default language of the notebook.  Every code block in the notebook is run independently and we […]

Read More

Reading and Writing CSV Files with spark-dotnet

Ed Elliott continues a series on Spark for .NET: How do you read and write CSV files using the dotnet driver for Apache Spark? I have a runnable example here:https://github.com/GoEddie/dotnet-spark-examples Specifcally:https://github.com/GoEddie/dotnet-spark-examples/tree/master/examples/split-csv The quoted links will take you straight to the code, but click through to see Ed’s commentary.

Read More

Categories

December 2018
MTWTFSS
« Nov Jan »
 12
3456789
10111213141516
17181920212223
24252627282930
31