Using Azure Data Lake Store With Hadoop

Amit Kulkarni shows how to make Azure Data Lake Store the default file system for a Hadoop cluster:

So to give a concrete example, if the default file system was hdfs://123.23.12.4344:9000 then the /user/filename.txt would resolve to hdfs://123.23.12.4344:9000/user/filename.txt.

Why does the default file system matter? The first answer to this is purely convenience. It is a heck lot easier to simply say /events/sensor1/ than adl://amitadls.azuredatalakestore.net/ in code and configurations. Secondly, many components in Hadoop use relative paths by default. For instance there are a fixed set of places, specified by relative paths, where various applications generate their log files. Finally, many ISV applications running on Hadoop specify important locations by relative paths.

Read on to see how.

Related Posts

Testing an Event-Driven System

Andy Chambers takes us through how to test an event-driven system: Each distinct service has a nice, pure data model with extensive unit tests, but now with new clients (and consequently new requirements) coming thick and fast, the number of these services is rapidly increasing. The testing guardian angel who sometimes visits your thoughts during […]

Read More

CosmosDB Continuation Tokens

Hasan Savran walks us through the idea of a continuation token in CosmosDB: In CosmosDB, TOP option is required and its default value is 100. You can change the default value by sending a different value using the request header “x-ms-max-item-count“. If you have 40000 rows in your Orders table, and run the same query […]

Read More

Categories

February 2017
MTWTFSS
« Jan Mar »
 12345
6789101112
13141516171819
20212223242526
2728