Using Azure Data Lake Store With Hadoop

Amit Kulkarni shows how to make Azure Data Lake Store the default file system for a Hadoop cluster:

So to give a concrete example, if the default file system was hdfs://123.23.12.4344:9000 then the /user/filename.txt would resolve to hdfs://123.23.12.4344:9000/user/filename.txt.

Why does the default file system matter? The first answer to this is purely convenience. It is a heck lot easier to simply say /events/sensor1/ than adl://amitadls.azuredatalakestore.net/ in code and configurations. Secondly, many components in Hadoop use relative paths by default. For instance there are a fixed set of places, specified by relative paths, where various applications generate their log files. Finally, many ISV applications running on Hadoop specify important locations by relative paths.

Read on to see how.

Related Posts

Data Lakes And Data Swamps

Randolph West talks about data lakes: Internet companies including search engines (Google, Bing), social media companies (Facebook, Twitter), and email providers (Yahoo!, Outlook.com) are managing data stores measured in petabytes. On a daily basis these organizations handle all sorts of structured and unstructured data. Assuming they put all their data in one repository, that could […]

Read More

Building TensorFlow Neural Networks On Spark With Keras

Jules Damji has an example of using the PyCharm IDE to use Keras to build TensorFlow neural network models on the Databricks MLflow library: Our example in the video is a simple Keras network, modified from┬áKeras Model Examples, that creates a simple multi-layer binary classification model with a couple of hidden and dropout layers and […]

Read More

Categories

February 2017
MTWTFSS
« Jan Mar »
 12345
6789101112
13141516171819
20212223242526
2728