Azure Data Lake Store Best Practices

Ust Oldfield provides recommendations on how to size and lay out files in Azure Data Lake Store:

The format of the file has a huge implication for the storage and parallelisation. Splittable formats – files which are row oriented, such as CSV – are parallelizable as data does not span extents. Non-splittable formats, however, – files what are not row oriented and data is often delivered in blocks, such as XML or JSON – cannot be parallelized as data spans extents and can only be processed by a single vertex.

In addition to the storage of unstructured data, Azure Data Lake Store also stores structured data in the form of row-oriented, distributed clustered index storage, which can also be partitioned. The data itself is held within the “Catalog” folder of the data lake store, but the metadata is contained in the data lake analytics. For many, working with the structured data in the data lake is very similar to working with SQL databases.

This is the type of thing that you can easily forget about, but it makes a huge difference down the line.

Related Posts

Considerations For Reducing I/O Costs

Monica Rathbun gives a few methods for reducing how many I/O operations a query requires: Implicit Conversions Implicit conversions often happen when a query is comparing two or more columns with different data types. In the below example, the system is having to perform extra I/O in order to compare a varchar(max) column to an nvarchar(4000) […]

Read More

Data Classification In Power BI

Steve Hughes describes how Power BI data classification works: Power BI Privacy Levels “specify an isolation level that defines the degree that one data source will be isolated from other data sources”. After working through some testing scenarios and trying to discover the real impact to data security, I was unable to effectively show how […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories

April 2017
MTWTFSS
« Mar  
 12
3456789
10111213141516
17181920212223
24252627282930