Working with Hive in HDInsight

Brad Llewellyn takes us through building an HDInsight cluster and writing Hive queries against it:

Hive is a “SQL on Hadoop” technology that combines the scalable processing framework of the ecosystem with the coding simplicity of SQL.  Hive is very useful for performant batch processing on relational data, as it leverages all of the skills that most organizations already possess.  Hive LLAP (Low Latency Analytical Processing or Live Long and Process) is an extension of Hive that is designed to handle low latency queries over massive amounts of EXTERNAL data.  One of this coolest things about the Hadoop SQL ecosystem is that the technologies allow us to create SQL tables directly on top of structured and semi-structured data without having to import it into a proprietary format.  That’s exactly what we’re going to do in this post.  You can read more about Hive here and here and Hive LLAP here.

We understand that SQL queries don’t typically constitute traditional data science functionality.  However, the Hadoop ecosystem has a number of unique and interesting data science features that we can explore.  Hive happens to be one of the best starting points on that journey.

Click through for the screenshot-laden demonstration.

Related Posts

Using the StreamSets Snowflake Destination

Dash Desai shows how you can use StreamSets to write data into SnowflakeDB: In particular, we’ll look at an example scenario that addresses Data Drift – where new information is added mid-stream and when that occurs the new table structure and new column values are created in Snowflake automatically. To illustrate, let’s take HTTP web server logs […]

Read More

Database-First or Kafka-First for Event Streaming

Gwen Shapiro takes us through a scenario where database-first writes for event streaming makes the most sense: Note that the DB does quite a lot for you: it enforces serializability, locks, your logical constraints, etc. If the DB is distributed (Vitesse, Cockroach, Spanner, Yugabyte), it does even more. If you were to go Kafka-first… well, […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories

March 2019
MTWTFSS
« Feb  
 123
45678910
11121314151617
18192021222324
25262728293031