Press "Enter" to skip to content

Author: Kevin Feasel

Debugging Azure Data Factory Pipelines

Cathrine Wilhelmsen shows us how to debug Azure Data Factory pipelines:

You debug a pipeline by clicking the debug button:

Tadaaa! Blog post done? 😀

I joke, I joke, I joke. Debugging pipelines is a one-click operation, but there are a few more things to be aware of. In the rest of this post, we will look at what happens when you debug a pipeline, how to see the debugging output, and how to set breakpoints.

Turns out there’s more to it than clicking a button.

Leave a Comment

Column Alteration with Minimal Downtime

Andy Mallon shows how you can turn an integer column into a bigint column without disrupting your users:

Changing a column from int to bigint has gotten a lot easier since I started working on SQL Server back at the turn of the century. SQL Server 2016 introduced the ability to do ALTER TABLE...ALTER COLUMN as an online operation using the WITH (ONLINE=ON) syntax. This wonderful syntax now allows you to alter a column from int to bigint without causing major blocking. The int to bigint conversion is one of the most popular data type changes I see–A developer inevitably creates the table thinking they will never have more than 2 billion rows… then some years or months later 2 billion becomes a reality.

The DBA is left with the task of implementing that data type change, and now that it has almost 2 billion rows, it’s a well-established table and uptime during the change is a major consideration.

This is a great post from Andy. If you want to dig into the concept of near-zero downtime in more detail, I’ve got a series on the topic.

Leave a Comment

What Slows Down Clustered Index Rebuilds

Kevin Chant has a few reasons why you might see slow clustered index rebuilds in your environment:

I better point out that online rebuilds in general tend to take longer. Mostly because behind the scene’s it’s making a rebuilt copy of your index and then it swaps around to the new index once it has completed.

However, there is another key point I should mention here.

Kevin also points out a sub-item for online rebuilds which could fit just as well in offline rebuilds: if there’s a long-running transaction which blocks SQL Server from taking the schema modification lock, you’ll be sitting there until those long-running transactions ahead of you finish.

Leave a Comment

Tools for Using SQL Server on Linux

Kellyn Pot’vin-Gorman has a list of tools you can use to make working with SQL Server on Linux a bit easier:

Along with the above versions of Linux distributions, SQL Server 2019 is supported in a container scenario using a Docker image.  Running a SQL Server database inside a Docker engine with Linux offers more flexibility, faster recovery, and quicker deployments, including deployments into the Azure cloud. For those becoming familiar with Linux, Docker for Windows or Mac gives you the option to run a Docker engine on your workstation with SQL Server 2019 on Linux.

Along with Docker technology, orchestration can be achieved, both managing and deploying SQL Server containers on Linux using Red Hat Open shift or Kubernetes. This includes SQL Server 2019 Big Data Clusters (BDC), fully scalable clusters with SQL Server, Spark, and Hadoop File System (HDFS). BDCs provide the ability to read, write, and analyze big data with T-SQL or Spark, and you can combine big data and relational data, too.

The set of tools just happens to be almost exactly the same set of tools as for Windows, but there are a few differences.

Leave a Comment

NFL Kicker Quality

Jacob Long has an outstanding pair of posts on evaluating kickers in the NFL. FIrst up is the analysis itself:

Justin Tucker is so great that, quite frankly, it doesn’t matter which metric you use. PAA, FG% – eFG%, or just plain old FG%, he’s unlike anyone else in the past 10 years. Given the well-documented trend of increasing kicker accuracy in the NFL, I think Tucker has a solid claim on being the greatest kicker of all time.

Even with fewer seasons than many of his competitors, his PAA are double all the others who kicked in the past 10 years. He had a slightly more difficult than average set of attempts but made a higher percentage of his attempts than anyone who has had more than 22 tries. Good luck trying to find any defect in Tucker’s record.

Jacob then covers the method in detail:

Pasteur and Cunningham-Rhoads — I’ll refer to them as PC-R for short — gathered more data than most predecessors, particularly in terms of auxiliary environmental info. They have wind, temperature, and presence/absence of precipitation. They show fairly convincingly that while modeling kick distance is the most important thing, these other factors are important as well. PC-R also find the cardinal direction of every NFL stadium (i.e., does it run north-south, east-west, etc.) and use this information along with wind direction data to assess the presence of cross-winds, which are perhaps the trickiest for kickers to deal with. They can’t know about headwinds/tailwinds because as far as they (and I) can tell, nobody bothers to record which end zone teams defend at the game’s coin toss, so we don’t know without looking at video which direction the kick is going. They ultimately combine the total wind and the cross wind, suggesting they have some meaningful measurement error that makes them not accurately capture all the cross-winds. Using their logistic regressions that factor for these several factors, they calculate an eFG% and use it and its derivatives to rank the kickers.

Those wind factors make certain stadiums like New Era Field (where Buffalo plays) tricky: it’s fun to see two flags right next to each other pointing in opposite directions, or the flags on the field goal posts pointing hard right, then switching to hard left, then switching back to hard right over the course of a field goal try. H/T R-Bloggers

Leave a Comment

Azure Databricks and Delta Lake

Brad Llewellyn starts a new series on Delta Lake in Azure Databricks:

Saving the data in Delta format is as simple as replacing the .format(“parquet”) function with .format(“delta”).  However, we see a major difference when we look at the table creation.  When creating a table using Delta, we don’t have to specify the schema, because the schema is already strongly defined when we save the data.  We also see that Delta tables can be easily queried using the same SQL we’re used to.  Next, let’s compare what the raw files look like by examining the blob storage container that we are storing them in.

There are some good demos in this post and it promises to be a nice series.

Leave a Comment

Using SQL Server as a REST API Back-End

Davide Mauri shows how you can use SQL Server to power an API, using Flask as an example:

I mentioned in my previous article that having native JSON support in Azure SQL it’s a game changer as it profoundly change the way a developer can interact with a relational database, bringing the simplicity and the flexibility needed in today’s Modern Applications.

As Python is becoming immensely popular, one of the most common tasks for a developer is to create REST API using Python. Thanks to JSON support, using Azure SQL as a backend database to support your API is as easy as writing to a text file, with the difference that behind the scenes you have all the peace of mind that your data will be safely stored and made available on request, at scale, with also the option to push as much compute to data as you want, so that you can leverage the powerful query and processing engine while keeping your code simple, elegant and agile, with a clear separation of concerns. All these things will help you immensely once you’ll start to evolve your project to keep it updated with today’s demanding and ever-changing world.

Those who remember the days of ASMX web services in SQL Server (thankfully removed after 2005) might cringe, but I’ve actually done something like this for a company, where all of the data lived in SQL Server and the transformation logic was pretty simple. If you have to monkey with the JSON afterward in your middle tier, then just bring back a data set, but in a scenario like Davide shows, moving the JSON creation to Python wouldn’t really gain you anything.

Leave a Comment

Orchestrating ADF Pipelines

Cathrine Wilhelmsen continues a series on Azure Data Factory:

The other way to build this solution is by creating an orchestration pipeline with two execute pipeline activities. This gives us a little more flexibility than having a single pipeline, because we can execute each pipeline separately if we want to.

Let’s start by creating a new pipeline and adding two execute pipeline activities to it. In the activity settings, select the pipelines to execute, and check wait on completion:

Read on for the demonstration.

Leave a Comment

TDE Encryption Scan Internals

On the Microsoft Tech Community blog, goramesh shares with us how the initial encryption process works for Transparent Data Encryption:

Now, once encryption is turned ON for a database, all the existing user data on the data files should be encrypted. To do this, SQL Server starts something called a TDE Encryption Scan. It is basically a scanner, which goes through each page of each data file to ensure its encrypted. When the scanner completes its scan across all the files, that’s when we say that the database is ‘encrypted’. How the TDE Encryption scan works is crucial because of the effects it can have on the user workload. Let me explain. 

Read on for the explanation.

Leave a Comment

Conditional Formatting Line and Area Charts with Power BI

Soheil Bakkshi shows how we can conditionally format line and area charts with Power BI:

One of my customers asked me to show time series in line charts and area charts. But she want’s it to be conditionally formatted based on the average value over time. Let’s keep it simple, she wants to show “Sales by Year Month” in line chart, but, highlight the data points that are below “Average Sales per Year Month”. As you may know, we currently do not have the luxury of formatting line charts and area charts. But wait, this post is all about that. Let’s dig into it.

From the above scenario, you perhaps already guessed that we need to create a measure which defines the colour based on “Average Sales per Year Month” to be able to format the chart conditionally. If any data point is below the “Average Sales per Year Month” then we highlight it in Orange, if it is above the “Average Sales per Year Month” then we stick to the default colour.

Let’s do it.

This is definitely not straightforward, but once you see the process, it’s pretty neat.

Leave a Comment