Press "Enter" to skip to content

Author: Kevin Feasel

Creating a Gen-2 Azure Data Lake Store

Cecilia Brusatori shares how to build a generation-2 data lake in Azure:

Finally, you’ve decided that Data Lake Gen 2 is good for your Data Analytics Scenario and you’ve started the journey, went to the Azure Portal and searched for it. Mhh you don’t see it in the options to create it, let’s try the search bar [typing Data Lake Gen2….] Nothing… Ok maybe you’ve missed something…. nope!
So what is in fact a Data Lake Gen 2? it is a blob storage account, optimized for Data Analytics.
Let’s take a look at how you are able to create it!

If you’re used to the first generation, where Azure Data Lake Storage was its own thing, it might take a minute to realize where it went.

Comments closed

PARALLEL_REDO_FLOW_CONTROL Waits on Availability Groups

Taryn Pratt goes through a short outage at Stack Overflow:

While I can’t be 100% sure of the trigger, I’m 99.9% sure, because the job was running before the outage, so the timing is right. After looking through our monitoring logs, everything pointed to the job being the cause, so yes, I’m confident it caused it.

We don’t have regular maintenance windows for any of our servers, so we run jobs throughout the week, and if possible, try to schedule them during low-usage times. In this case, the job was an index maintenance job.

Now, before you scream at me about running an index maintenance job, I’m not going to argue the pros and cons of using it or whether or not we should run it — we can do that at another time. For this post, just accept the fact that we were running a job to rebuild/reorganize indexes

This is an interesting after-analysis of an outage. I have a lot of respect for people who can put these together and make them public—I would have a lot of trouble doing that myself.

Comments closed

Azure Data Factory Data Flows

Cathrine Wilhelmsen continues a series on Azure Data Factory:

So far in this Azure Data Factory series, we have looked at copying data. We have created pipelinescopy data activitiesdatasets, and linked services. In this post, we will peek at the second part of the data integration story: using data flows for transforming data.

But first, I need to make a confession. And it’s slightly embarrassing…

I don’t use data flows enough to keep up with all the changes and new features

To be fair to Cathrine, this is a rapidly-changing part of ADF.

Comments closed

Converting Databricks Notebooks to ipynb

Dave Wentzel shows how we can convert a Databricks notebook (in DBC format) to a normal Jupyter notebook (in ipynb format):

Databricks natively stores it’s notebook files by default as DBC files, a closed, binary format. A .dbc file has a nice benefit of being self-contained. One dbc file can consist of an entire folder of notebooks and supporting files. But other than that, dbc files are frankly obnoxious.

Read on to see how to convert between these two formats.

Comments closed

Web Scraping with F#

Jamie Dixon walks us through scraping a webpage using F#:

I need to go through all 8 pages of the grid and download the .pdfs that are associated with the “View Report” link. The challenge in this particular site is that they didn’t do any url parameters so there is no way to go through the grid via the uri. Looking at the page source, they are using ASP.NET and in typical enterprise-derpy manner, named their table “GridView1”

The way to get to the next page is to press on the “Next” link defined like this:

They over-achieved in the bloated View State for a simple page category though.

#Sigh

The code is straightforward and available as a Gist in the post.

Comments closed

Querying SQL Server from Python

Hasan Savran builds an Azure Data Studio notebook to query SQL Server from Python:

SQL Kernel is the default language, to query database with Python change SQL to Python 3. Probably, you will see the following message if this is the first time you are trying this. You need to install Python packages to be able to run python scripts. I have Visual Studio installed on my machine and I already have Python, I taught I could to use it by clicking “Use existing Python installation”. I was wrong, I couldn’t. This option looks for local installation files and when I point to Visual Studio Python files, it throws error in the middle of the installation. So, I will ignore this option for now.

In ADS, I haven’t gotten “Use existing Python location” to work either, so Hasan’s not alone in that regard.

Comments closed

Time Series Anomaly Detection with Power BI

Leila Etaati takes us through time series anomaly detection with Cognitive Services and Power Query:

I am excited about this blog post, this is based on the New service in Cognitive Service name “Anomaly Detection” which is now in Preview.
I recorded a video about how it works in cognitive service https://youtu.be/7ZOtZDbn6gM. 

However, I am going to talk about how to use it in Power BI. In this post first, a brief introduction to the anomaly detection will be presented, then how it can be used inside Power BI will be discussed.

It sounds like there are still some rough edges, but they already have the makings of an interesting service.

Comments closed

Azure Data Factory Continued

Cathrine Wilhelmsen continues a series on Azure Data Factory. Catching up from the last time around, we first see the Copy Data activity:

You can copy data to and from more than 80 Software-as-a-Service (SaaS) applications (such as Dynamics 365 and Salesforce), on-premises data stores (such as SQL Server and Oracle), and cloud data stores (such as Azure SQL Database and Amazon S3). During copying, you can define and map columns implicitly or explicitly, convert file formats, and even zip and unzip files – all in one task.

Yeah. It’s powerful 🙂 But how does it really work?

Then Cathrine hits datasets:

But… please, please, please don’t use “source” or “destination” or “sink” or “input” or “output” or anything like that in your dataset names. It makes sense when you have one pipeline with one copy data activity, but as soon as you start building out your solution, it can get messy. Because what if you realize you want to use the original destination dataset as a source dataset in another copy data activity? Yeah… 🙂

So! Let’s rename the datasets.

After that, it’s on to linked services:

Azure Key Vault is a service for storing and managing secrets (like connection strings, passwords, and keys) in one central location. By storing secrets in Azure Key Vault, you don’t have to expose any connection details inside Azure Data Factory. You can connect to “the application database” without directly seeing the server, database name, or credentials used.

Cathrine is rolling with this series and it’s been great so far.

Comments closed

Build and Deploy SSIS Projects with Azure DevOps

Joost van Rossum has a pair of posts on Azure DevOps updates. First, Azure DevOps supports building SSIS projects:

This new task is much easier to use than the PowerShell code and also easier than most of the third party tasks. With a little practice you can now easily create a build task under two minutes which is probably faster than the build itself.

If your build fails with the following error message then you are probably using a custom task or component (like Blob Storage Download Task). These tasks are not installed on the build agents hosted by Microsoft. The solution is to use a self hosted agent where you can install all custom components

Second, Azure DevOps supports deploying SSIS projects:

Microsoft just released the SSIS Deploy task (public preview) which makes it much easier to deploy an SSIS project. Below you will find the codeless steps to deploy artifacts created by the SSIS Build task.

Click through for the step-by-step instructions for each.

Comments closed