Getting Started With Hadoop

Kevin Feasel

2016-03-25

Hadoop

Ginger Grant has some pointers on getting started with the Hortonworks Data Platform sandbox:

Previously, spinning up a virtual machine meant purchasing software. No more, as there is now an open source application. In the example shown here, the Linux operating system will be installed, you can put any operating system you want on your virtual machine, provided of course you have a license for it. If you don’t feel comfortable installing non-released versions of code like SQL Server 2016, on your pc, a virtual machine is a great way to test it out. You will need to provide your own operating system, but there are trial versions you can use for limited periods of time as well. The open source virtual machine Oracle VM Virtual Box is the only open source version of a virtual machine software. You can download it here. This software is needed prior to installing the Hortonworks Sandbox. Obviously Hortonworks is not the only version of Hadoop available, Cloudera has a Hadoop VM too, which you can download as well. Personally I am not a use fan of the Cloudera Manager, which is why I prefer Hortonworks, but either will work with polybase.

I’m personally a fan of VMware Player for VMs, but either will work well for the task.

Related Posts

Overriding Spark Dependencies

Landon Robinson shows how to override a Spark dependency located on the classpath: This doesn’t draw the line exactly where the method changed from private to public, but generally speaking:– gson-2.2.4.jar: the method is private, and therefore too old for use here– gson-2.6.1: the method is public, and works fine.– Somewhere between the two, the […]

Read More

Kafka and MirrorMaker

Renu Tewari describes what MirrorMaker does for Kafka today and what is coming with version 2: Apache Kafka has become an essential component of enterprise data pipelines and is used for tracking clickstream event data, collecting logs, gathering metrics, and being the enterprise data bus in a microservices based architectures. Kafka is essentially a highly […]

Read More

Categories

March 2016
MTWTFSS
« Feb Apr »
 123456
78910111213
14151617181920
21222324252627
28293031