Press "Enter" to skip to content

Category: Cloud

Orchestrating ADF Pipelines

Cathrine Wilhelmsen continues a series on Azure Data Factory:

The other way to build this solution is by creating an orchestration pipeline with two execute pipeline activities. This gives us a little more flexibility than having a single pipeline, because we can execute each pipeline separately if we want to.

Let’s start by creating a new pipeline and adding two execute pipeline activities to it. In the activity settings, select the pipelines to execute, and check wait on completion:

Read on for the demonstration.

Leave a Comment

Creating a Gen-2 Azure Data Lake Store

Cecilia Brusatori shares how to build a generation-2 data lake in Azure:

Finally, you’ve decided that Data Lake Gen 2 is good for your Data Analytics Scenario and you’ve started the journey, went to the Azure Portal and searched for it. Mhh you don’t see it in the options to create it, let’s try the search bar [typing Data Lake Gen2….] Nothing… Ok maybe you’ve missed something…. nope!
So what is in fact a Data Lake Gen 2? it is a blob storage account, optimized for Data Analytics.
Let’s take a look at how you are able to create it!

If you’re used to the first generation, where Azure Data Lake Storage was its own thing, it might take a minute to realize where it went.

Leave a Comment

Azure Data Factory Data Flows

Cathrine Wilhelmsen continues a series on Azure Data Factory:

So far in this Azure Data Factory series, we have looked at copying data. We have created pipelinescopy data activitiesdatasets, and linked services. In this post, we will peek at the second part of the data integration story: using data flows for transforming data.

But first, I need to make a confession. And it’s slightly embarrassing…

I don’t use data flows enough to keep up with all the changes and new features

To be fair to Cathrine, this is a rapidly-changing part of ADF.

Leave a Comment

Time Series Anomaly Detection with Power BI

Leila Etaati takes us through time series anomaly detection with Cognitive Services and Power Query:

I am excited about this blog post, this is based on the New service in Cognitive Service name “Anomaly Detection” which is now in Preview.
I recorded a video about how it works in cognitive service https://youtu.be/7ZOtZDbn6gM. 

However, I am going to talk about how to use it in Power BI. In this post first, a brief introduction to the anomaly detection will be presented, then how it can be used inside Power BI will be discussed.

It sounds like there are still some rough edges, but they already have the makings of an interesting service.

Leave a Comment

Azure Data Factory Continued

Cathrine Wilhelmsen continues a series on Azure Data Factory. Catching up from the last time around, we first see the Copy Data activity:

You can copy data to and from more than 80 Software-as-a-Service (SaaS) applications (such as Dynamics 365 and Salesforce), on-premises data stores (such as SQL Server and Oracle), and cloud data stores (such as Azure SQL Database and Amazon S3). During copying, you can define and map columns implicitly or explicitly, convert file formats, and even zip and unzip files – all in one task.

Yeah. It’s powerful 🙂 But how does it really work?

Then Cathrine hits datasets:

But… please, please, please don’t use “source” or “destination” or “sink” or “input” or “output” or anything like that in your dataset names. It makes sense when you have one pipeline with one copy data activity, but as soon as you start building out your solution, it can get messy. Because what if you realize you want to use the original destination dataset as a source dataset in another copy data activity? Yeah… 🙂

So! Let’s rename the datasets.

After that, it’s on to linked services:

Azure Key Vault is a service for storing and managing secrets (like connection strings, passwords, and keys) in one central location. By storing secrets in Azure Key Vault, you don’t have to expose any connection details inside Azure Data Factory. You can connect to “the application database” without directly seeing the server, database name, or credentials used.

Cathrine is rolling with this series and it’s been great so far.

Leave a Comment

Build and Deploy SSIS Projects with Azure DevOps

Joost van Rossum has a pair of posts on Azure DevOps updates. First, Azure DevOps supports building SSIS projects:

This new task is much easier to use than the PowerShell code and also easier than most of the third party tasks. With a little practice you can now easily create a build task under two minutes which is probably faster than the build itself.

If your build fails with the following error message then you are probably using a custom task or component (like Blob Storage Download Task). These tasks are not installed on the build agents hosted by Microsoft. The solution is to use a self hosted agent where you can install all custom components

Second, Azure DevOps supports deploying SSIS projects:

Microsoft just released the SSIS Deploy task (public preview) which makes it much easier to deploy an SSIS project. Below you will find the codeless steps to deploy artifacts created by the SSIS Build task.

Click through for the step-by-step instructions for each.

Leave a Comment

Azure Data Factory Pipelines

Cathrine Wilhelmsen continues a series on Azure Data Factory with a discussion of pipelines:

Pipelines are sorted by name, so I recommend that you decide on a naming convention early in your project. And yeah, I keep saying this to everyone else, but then I can never decide on how to name my own pipelines, haha 🙂 Don’t worry if you end up renaming your pipelines several times while you work on your project. It happens, and that’s completely fine, but try to stick to some kind of naming convention throughout your project.

In addition to naming conventions, you can create folders to organize your pipelines. Click the actions ellipsis next to the pipelines group, then click new folder.

Read on for more.

Leave a Comment

Azure Data Factory Components and Copy Data Wizard

Cathrine Wilhelmsen continues a series on Azure Data Factory. First, we get an overview of the available components:

Pipelines are the things you execute or run in Azure Data Factory, similar to packages in SQL Server Integration Services (SSIS). This is where you define your workflow: what you want to do and in which order. For example, a pipeline can first copy data from an on-premises data center to Azure Data Lake Storage, and then transform the data from Azure Data Lake Storage into Azure Synapse Analytics (previously Azure SQL Data Warehouse).

Then, Cathrine looks at the Copy Data wizard:

LEGO! Yay! I love LEGO. Rebrickable is an online service that will show you which LEGO sets you can build from the sets and parts you already own. Fun! 🙂

They also have a database of all official LEGO sets and parts (including themes and colors) that you can download for free as CSV files or JSON files.

The CSV files are automatically generated at the start of each month and can be found on rebrickable.com/downloads

Cathrine takes this LEGO data and feeds it into Azure Data Lake Storage.

Leave a Comment

Incremental Data Moves to Azure Blob Storage

Ginger Daniel continues a series on moving data incrementally from SQL Server to Azure Blob Storage:

In Part 1 of this series, we demonstrated how to copy a full SQL database table from a SQL Server database into an Azure Blob Storage account as a csv file.  My client needed data moved from their on premise SQL Server database to Azure, and then needed the daily incremental data changes uploaded as well.  This article will discuss how to upload the incremental data changes to Azure after the initial data load.

Click through for the process.

Leave a Comment

Changes to EC2 Metadata Service

Praveen Sripati takes a look at changes to the AWS EC2 Instance Metadata Service following attacks against Capital One and dozens of other organizations:

Captial One Bank (1) and 30 different organizations were hacked around end of July, I have written a blog (1) around the same time on how to recreate the hack in your own AWS account and also a few mitigations around the same. Now, AWS has made a few changes to the AWS EC2 Instance Metadata Service (IMDS) around the same (12). AWS re:Invent 2019 session (1) around the same has also been planned on December 5th, 2019. Will update with the link once the recording of the session has been uploaded.

The old/existing approach is called IMDSv1 and the new one IMDSv2. Although IMDSv1 solves a few problems like not storing the access keys on the EC2, it bought its own headaches which lead to the hacks.

Click through to see what these problems were and how they led to IMDSv2.

Leave a Comment