Press "Enter" to skip to content

Category: Cloud

Azure Data Factory Triggers

Cathrine Wilhelmsen continues a series on Azure Data Factory by looking at triggers:

One important thing to note is that all times are in UTC. And since UTC does not observe daylight saving time… Well, let’s just say that if you need to execute pipelines during the workday and you have business users waiting for data, you may want to plan some trigger maintenance on the days when you fall back or spring forward. I know. Ugh 🙂 I’m hoping for better timezone support in the future 🤞🏻

Schedule triggers and pipelines have a many-to-many relationship. That means that one schedule trigger can execute many pipelines, and one pipeline can be executed by many schedule triggers.

Time-based triggers aren’t the only options, however—Cathrine also looks at the other three possibilities.

Comments closed

Debugging Azure Data Factory Pipelines

Cathrine Wilhelmsen shows us how to debug Azure Data Factory pipelines:

You debug a pipeline by clicking the debug button:

Tadaaa! Blog post done? 😀

I joke, I joke, I joke. Debugging pipelines is a one-click operation, but there are a few more things to be aware of. In the rest of this post, we will look at what happens when you debug a pipeline, how to see the debugging output, and how to set breakpoints.

Turns out there’s more to it than clicking a button.

Comments closed

Orchestrating ADF Pipelines

Cathrine Wilhelmsen continues a series on Azure Data Factory:

The other way to build this solution is by creating an orchestration pipeline with two execute pipeline activities. This gives us a little more flexibility than having a single pipeline, because we can execute each pipeline separately if we want to.

Let’s start by creating a new pipeline and adding two execute pipeline activities to it. In the activity settings, select the pipelines to execute, and check wait on completion:

Read on for the demonstration.

Comments closed

Creating a Gen-2 Azure Data Lake Store

Cecilia Brusatori shares how to build a generation-2 data lake in Azure:

Finally, you’ve decided that Data Lake Gen 2 is good for your Data Analytics Scenario and you’ve started the journey, went to the Azure Portal and searched for it. Mhh you don’t see it in the options to create it, let’s try the search bar [typing Data Lake Gen2….] Nothing… Ok maybe you’ve missed something…. nope!
So what is in fact a Data Lake Gen 2? it is a blob storage account, optimized for Data Analytics.
Let’s take a look at how you are able to create it!

If you’re used to the first generation, where Azure Data Lake Storage was its own thing, it might take a minute to realize where it went.

Comments closed

Azure Data Factory Data Flows

Cathrine Wilhelmsen continues a series on Azure Data Factory:

So far in this Azure Data Factory series, we have looked at copying data. We have created pipelinescopy data activitiesdatasets, and linked services. In this post, we will peek at the second part of the data integration story: using data flows for transforming data.

But first, I need to make a confession. And it’s slightly embarrassing…

I don’t use data flows enough to keep up with all the changes and new features

To be fair to Cathrine, this is a rapidly-changing part of ADF.

Comments closed

Time Series Anomaly Detection with Power BI

Leila Etaati takes us through time series anomaly detection with Cognitive Services and Power Query:

I am excited about this blog post, this is based on the New service in Cognitive Service name “Anomaly Detection” which is now in Preview.
I recorded a video about how it works in cognitive service https://youtu.be/7ZOtZDbn6gM. 

However, I am going to talk about how to use it in Power BI. In this post first, a brief introduction to the anomaly detection will be presented, then how it can be used inside Power BI will be discussed.

It sounds like there are still some rough edges, but they already have the makings of an interesting service.

Comments closed

Azure Data Factory Continued

Cathrine Wilhelmsen continues a series on Azure Data Factory. Catching up from the last time around, we first see the Copy Data activity:

You can copy data to and from more than 80 Software-as-a-Service (SaaS) applications (such as Dynamics 365 and Salesforce), on-premises data stores (such as SQL Server and Oracle), and cloud data stores (such as Azure SQL Database and Amazon S3). During copying, you can define and map columns implicitly or explicitly, convert file formats, and even zip and unzip files – all in one task.

Yeah. It’s powerful 🙂 But how does it really work?

Then Cathrine hits datasets:

But… please, please, please don’t use “source” or “destination” or “sink” or “input” or “output” or anything like that in your dataset names. It makes sense when you have one pipeline with one copy data activity, but as soon as you start building out your solution, it can get messy. Because what if you realize you want to use the original destination dataset as a source dataset in another copy data activity? Yeah… 🙂

So! Let’s rename the datasets.

After that, it’s on to linked services:

Azure Key Vault is a service for storing and managing secrets (like connection strings, passwords, and keys) in one central location. By storing secrets in Azure Key Vault, you don’t have to expose any connection details inside Azure Data Factory. You can connect to “the application database” without directly seeing the server, database name, or credentials used.

Cathrine is rolling with this series and it’s been great so far.

Comments closed

Build and Deploy SSIS Projects with Azure DevOps

Joost van Rossum has a pair of posts on Azure DevOps updates. First, Azure DevOps supports building SSIS projects:

This new task is much easier to use than the PowerShell code and also easier than most of the third party tasks. With a little practice you can now easily create a build task under two minutes which is probably faster than the build itself.

If your build fails with the following error message then you are probably using a custom task or component (like Blob Storage Download Task). These tasks are not installed on the build agents hosted by Microsoft. The solution is to use a self hosted agent where you can install all custom components

Second, Azure DevOps supports deploying SSIS projects:

Microsoft just released the SSIS Deploy task (public preview) which makes it much easier to deploy an SSIS project. Below you will find the codeless steps to deploy artifacts created by the SSIS Build task.

Click through for the step-by-step instructions for each.

Comments closed

Azure Data Factory Pipelines

Cathrine Wilhelmsen continues a series on Azure Data Factory with a discussion of pipelines:

Pipelines are sorted by name, so I recommend that you decide on a naming convention early in your project. And yeah, I keep saying this to everyone else, but then I can never decide on how to name my own pipelines, haha 🙂 Don’t worry if you end up renaming your pipelines several times while you work on your project. It happens, and that’s completely fine, but try to stick to some kind of naming convention throughout your project.

In addition to naming conventions, you can create folders to organize your pipelines. Click the actions ellipsis next to the pipelines group, then click new folder.

Read on for more.

Comments closed

Azure Data Factory Components and Copy Data Wizard

Cathrine Wilhelmsen continues a series on Azure Data Factory. First, we get an overview of the available components:

Pipelines are the things you execute or run in Azure Data Factory, similar to packages in SQL Server Integration Services (SSIS). This is where you define your workflow: what you want to do and in which order. For example, a pipeline can first copy data from an on-premises data center to Azure Data Lake Storage, and then transform the data from Azure Data Lake Storage into Azure Synapse Analytics (previously Azure SQL Data Warehouse).

Then, Cathrine looks at the Copy Data wizard:

LEGO! Yay! I love LEGO. Rebrickable is an online service that will show you which LEGO sets you can build from the sets and parts you already own. Fun! 🙂

They also have a database of all official LEGO sets and parts (including themes and colors) that you can download for free as CSV files or JSON files.

The CSV files are automatically generated at the start of each month and can be found on rebrickable.com/downloads

Cathrine takes this LEGO data and feeds it into Azure Data Lake Storage.

Comments closed