Data Cleansing Options with Azure

James Serra tries to give you an answer of when you should use different Azure services for data cleansing:

Clean the data and optionally aggregate it as it sits in source system.  The tool used for this would depend on the source system that stores the data (i.e. if SQL Server, you would use stored procedures).  The only benefit with this option is if you aggregate the data, you will move less data from the source system to Azure, which can be helpful if you have a small pipe to Azure and don’t need the row-level details.  The disadvantages are: the raw source data is not available in the data lake, so you would always need to go back to source system if you needed to get it again, and it may not even still exist in the source system; you would put extra stress on the source system when doing the cleaning which could affect end users using the system; it could take a long time to clean the data as the source system may not have fast performance; and you would not be able to use other tools (i.e. Hadoop, Databricks) to clean it.  Strongly advise against this option

Read on for additional options and James’s recommendations.

Related Posts

Azure SQL Database and Extended Events

Dave Bland shows how to set up and read an extended event file on Azure SQL Database: This first step when using T-SQL to read Extended Files that are stored in an Azure Storage Account is to create a database credential.  Of course the credential will provide essential security information to connect to the Azure […]

Read More

Azure SQL Database Serverless

Arun Sirpal takes us through Azure SQL Database Serverless: This is best used for those single databases that are ever changing with unpredictable patterns. With the concept of being billed per second (based on the vcores used) rather than per hour means that pricing can become more granular especially now with auto-pause becoming possible. The […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories

May 2019
MTWTFSS
« Apr  
 12345
6789101112
13141516171819
20212223242526
2728293031