Data Transformation Tools In The Azure Space

James Serra gives us an overview of the major tools you would use for ETL and ELT in Azure:

If you are building a big data solution in the cloud, you will likely be landing most of the source data into a data lake.  And much of this data will need to be transformed (i.e. cleaned and joined together – the “T” in ETL).  Since the data lake is just storage (i.e. Azure Data Lake Storage Gen2 or Azure Blob Storage), you need to pick a product that will be the compute and will do the transformation of the data.  There is good news and bad news when it comes to which product to use.  The good news is there are a lot of products to choose from.  The bad news is there are a lot of products to choose from :-).  I’ll try to help your decision-making by talking briefly about most of the Azure choices and the best use cases for each when it comes to transforming data (although some of these products also do the Extract and Load part

The only surprise is the non-mention of Azure Data Lake Analytics, and there is a good conversation in the comments section explaining why.

Related Posts

MRAppMaster Errors Running MapReduce Jobs

I have a post looking at potential causes when PolyBase MapReduce jobs are unable to find the MRAppMaster class: Let me tell you about one of my least favorite things I like to see in PolyBase: Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster This error is not limited to PolyBase but is instead […]

Read More

Sizing Azure SQL Database

Arun Sirpal takes us through finding the right size for Azure SQL Database: Do you want to identify the correct Service Tier and Compute Size ( was once known as performance level) for your Azure SQL Database? How would you go about it? Would you use the DTU (Database Transaction Unit) calculator? What about the […]

Read More

Categories

January 2019
MTWTFSS
« Dec Feb »
 123456
78910111213
14151617181920
21222324252627
28293031