Integrating Spark With Hive

Rahul Kumar wants to write Scala code to access the Hive datastore:

Hello geeks, we have discussed how to start programming with Spark in Scala. In this blog, we will discuss how we can use Hive with Spark 2.0.

When you start to work with Hive, you need HiveContext (inherits SqlContext), core-site.xml,hdfs-site.xml, and hive-site.xml for Spark. In case you don’t configure hive-site.xml then the context automatically creates metastore_db in the current directory and creates warehousedirectory indicated by HiveConf(which defaults user/hive/warehouse).

Rahul has made his demo code available on GitHub.

Related Posts

Replicating Data In HDFS Between Clusters

Murali Ramasami and Niru Anisetti have an article showing how to use the Hortonworks Data Lifecycle Manager to set up replication between two Hadoop clusters: Data Lifecycle Manager (DLM) delivers on the promise of location-agnostic, secure replication by encapsulating and copying data seamlessly across physical private storage and public cloud environments. This empowers businesses to […]

Read More

Installing Confluent Platform On Windows

Niels Berglund shows how to install Confluent Platform (the Confluent branded version of Apache Kafka) on a Windows machine using the Windows Subsystem for Linux: WSL is primarily aimed at developers, and it allows you to run Linux environments directly on Windows in a native format and without the overhead of a virtual machine. Let us […]

Read More


September 2016
« Aug Oct »