ETL / ELT – Page 27 – Curated SQL

Bulk Loading SQL Server from .NET

Published 2020-06-05 by Kevin Feasel

Adrian Hills walks us through the SqlBulkCopy class:

Ever been in a situation where rumblings of “process X is too slow” suddenly build into a super-high priority ball of urgency when that next step up in data volume hits? Yeah, that can be fun. No, really, it can be fun because we have strategies to sort this stuff out, right?
In this blog post, I’m going to talk about one particular piece of functionality—SqlBulkCopy—that can help you with bulk data loading. If I had to single out my favorite .NET class, SqlBulkCopy would be at the top of the list. My goal is to introduce you to this class so that maybe it can become a part of your tool belt, too.

Click through to see how it works. If you’re familiar with SSIS, you’re already familiar with the concept if not the specifics.

Comments closed

Running Databricks ETL Code from Azure Data Factory

Published 2020-06-05 by Kevin Feasel

Ed Elliott shares several approaches for executing Databricks code from Azure Data Factory:

.Net
No post I write about Spark is going to miss off .Net, if you are writing C# or F# then you can’t use a notebook anyway. If you do this then you will also include the .Net driver JAR file (weird statement to make) and call that.

Read on for all of the techniques.

Comments closed

Feeding Databricks Output to Azure SQL Database

Published 2020-05-29 by Kevin Feasel

Arun Sirpal takes us through the process of moving data from Databricks into Azure SQL Database:

Recently I got to a stage where I leveraged Databricks to the best of my ability to join couple of CSV files together, play around some aggregations and then output it back to a different mount point ( based on Azure Storage) as a parquet file, I decided that I actually wanted to move this data into Azure SQL DB, which you may want to do one day.

This isn’t just dropping files into Blob Storage and picking them up, but rather a direct integration.

Comments closed

When to Have Multiple Azure Data Factories

Published 2020-05-26 by Kevin Feasel

Paul Andrew explains how to become a factory mogul:

The obvious and easy reason for having multiple Data Factory’s could be that you simply want to separate your business processes. Maybe they all have separate data delivery requirements and it just makes management of data flows easier to handle. For example:
– Sales
– Finance
– HR
They could have different data delivery deadlines, they process on different schedules and don’t share any underlying connections.
You may also have multiple projects underway that mean you want to keep teams isolated.

But that’s not the only reason, so click through to learn several other reasons why you might have multiple Azure Data Factory instances running.

Comments closed

Using the SSIS Hadoop Components

Published 2020-05-12 by Kevin Feasel

Hadi Fadlallah walks us through the HDFS file source and destination components:

To test these components, we will create an SSIS package and add three connection managers:
1. Hadoop Connection Manager: to connect with the Hadoop cluster (check the previous article)
2. OLE DB Connection Manager: to connect to SQL Server instance where AdventureWorks2017 database is stored
3. Flat File Connection Manager: We will use it to export data from HDFS Source:

I wonder if they ever fixed the 4K screen resolution problem (kind of tells you how often I use SSIS anymore…). That was one of the things which made these components unusable for me on any modern screen.

Comments closed

Replicating SQL’s IN Operator with Azure Data Factory

Published 2020-05-04 by Kevin Feasel

Rayis Imayev shows how we can find values in a group using Azure Data Factory:

However only this use-case for the OR function with 2 condition could be possible:or(equals(variables(‘var1’), ‘A’), equals(variables(‘var1’), ‘B’)) – limit of two conditions
But what if we have an ability to check if a particular element variable/parameter/other ADF object value belongs to a range of values (array of value), similarly to what we can do with the IN operator in SQL language, this would definitely solve our problem and remove the limitation of logical conditions to check.

Click through for the answer.

Comments closed

Case-Insensitive Power Query Merges

Published 2020-04-28 by Kevin Feasel

Ed Hansberry takes us through a pain point in Power Query:

Power Query is notorious for being case sensitive. Even its language is case sensitive. Often though you get data from users where they are using different cases for the same data. Some never use the shift key, and others CAPSLAP everything.

Click through for your two options, plus a bonus option which might work.

Comments closed

Scheduling SSIS Packages in Azure

Published 2020-04-28 by Kevin Feasel

Magi Naumova takes us through the process of running SSIS in Azure Data Factory, including the scheduling of jobs to run our SSIS packages:

The main purpose of these tools is to force the Lift and Shift approach of migrating and running existing SSIS Packages in Azure. I wouldn’t say that this is the most effective approach of transferring the ETL to Azure, but it could be a good start on a road of a Modern Azure Datawarehouse Architecture. If you have already deployed SSIS packages in Azure SSIS Catalog, then SSMS 18 helps you to put them on schedule very quickly.
Running SSIS Packages in Azure requires provisioning of SSIS Runtime Engine, an Azure Data Factory instance and a SQL Database which hosts the SSIS catalog. Scheduling SSIS Packages in Azure requires creating a data flow pipeline in ADF which has a trigger defined for scheduled execution. While describing all those concepts is far above the scope of this chapter, a short description would be useful.

Read on for a good amount of detail and a demo which walks through the process.

Comments closed

Unioning Datasets in Azure Data Factory

Published 2020-04-24 by Kevin Feasel

Rayis Imayev takes us through the Union Transformation in Mapping Data Flows:

Recently I had a chance to work on a special case of creating a file where the first two rows were static and contained output table metadata (columns names and column types), assuming that further reading process would be able to explicitly identify each column data type and consume new file correctly in a single run.

Read on to see how Rayis solved this with Azure Data Factory.

Comments closed

Using Azure Functions Inside Azure Data Factory

Published 2020-04-20 by Kevin Feasel

Rayis Imayev shows how you can call an Azure Function from inside your Azure Data Factory Pipeline:

Creating a data solution with Azure Data Factory (ADF) may look like a straightforward process: you have incoming datasets, business rules of how to connect and change them and a final destination environment to save this transformed data. Very often your data transformation may require more complex business logic that can only be developed externally (scripts, functions, web-services, databricks notebooks, etc.).

In this blog post, I will try to share my experience of using Azure Functions in my Data Factory workflows: my highs and lows of using them, my victories and struggles to make them work.

This includes a description of the options, a demo function, and additional notes for each technique.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Category: ETL / ELT