Press "Enter" to skip to content

Author: Kevin Feasel

Orchestrating Azure Data Explorer Queries via Apache Airflow

Michael Spector does some automation:

Apache Airflow is a widely used task orchestration framework, which gained its popularity due to Python-based programmatic interface – the language of first choice by Data engineers and Data ops. The framework allows defining complex pipelines that move data around different parts, potentially implemented using different technologies.

The following article shows how to setup managed instance of Apache Airflow and define a very simple DAG (direct acyclic graph) of tasks that does the following:

  • Uses Azure registered application to authenticate with the ADX cluster.
  • Schedules daily execution of a simple KQL query that calculates HTTP errors statistics based on Web log records for the last day.

Click through for the process.

Comments closed

New Script: sp_HealthParser

Erik Darling has a new script for us:

I recently found myself in the midst of a client issue that lead me to need a bunch of corroborating data from the system health extended event session.

There’s hardly any good documentation on it, and even fewer well-written queries or resources for parsing data out of it.

So now I’m making it easy for you, because I care about you more that Microsoft does.

Click through for a link to the script and to get an idea of what it provides.

Comments closed

Improving an Extra-Wide Column or Bar Chart

Mike Cisneros goes bar-hopping:

Let’s take a closer look at that extra-tall bar chart. I came across a graph like this one when working with a recent client. They created a visual to compare the quarterly sales revenues for each of 25 different sales associates against their individual sales targets.

Mike then gives us two routes to improving the quality of this visual, one which maintains the bar chart style and one which replaces it with a Cleveland dot plot.

Comments closed

Statistics and Ascending Keys in SQL Server

Matthew McGiffen talks stats:

The Ascending Key Problem relates to the most recently inserted data in your table which is therefore also the data that may not have been sampled and included in the statistics histograms. This sort of issue is one of the reasons it can be critical to update your statistics more regularly than the built-in automatic thresholds.

We’ll look at the problem itself, but also some of the mitigations that you can take to deal with it within SQL Server.

Read on to see how this behaved prior to SQL Server 2014’s new cardinality estimator and what has changed since then.

Comments closed

An End-to-End Microsoft Fabric Implementation

Jordan Witcombe takes us through an example:

This blog will walk you through the entire data lifecycle of ingesting data from SharePoint and Azure Blob Storage, through the lakehouse pattern, and finally to the reporting stage – all using Microsoft Fabric.

As we’re well into summer now, festivals can be a great way of spending time with friends and family. But all too often, we each want to see various acts and activities. How can we make this easier? With this dataset I created, it takes a simple Excel / CSV file with acts, stages, who wants to see this, and a personal rating. We will run this through the lakehouse pattern and finish with a curated/gold layer which can be used for reporting directly on top of. For more information on lake layers, you can see Ed’s blog Medallion Architecture: What is it?. So, you can highlight those clashes and keep everyone happy during your festival fun!

Read on for the showcase.

Comments closed

sp_HumanEventsBlockViewer Updates

Erik Darling has another update:

In this post, I’m going to talk about a couple cool changes to sp_HumanEventsBlockViewer, a procedure I wrote to analyze the blocked process report via Extended Events, and wish I had given a snazzier name to.

You see, when I wrote it, I pictured it as a utility script for sp_HumanEvents, which will set up the blocked process report and an extended event. But it turns out I use it a lot more on its own.

Read on for Erik’s update, including a neat trick around using an aggregate within a window function to generate ordering.

Comments closed

Tracking High-Level Power BI Import Throughput Stats

Chris Webb collects some measurements:

In the first post in this series I described the events in Log Analytics that can be used to understand throughput – the speed that Power BI can read from your dataset when importing data from it – during refresh. While the individual events are easy to understand when you look at a simple example they don’t make it easy to analyse the data in the real world, so here’s a KQL query that takes all the data from all these events and gives you one row per partition per refresh:

Click through for the KQL script, as well as what it all means and what you can get out of it.

Comments closed

The Basics of Fact-Dimensional Modeling

Nikola Ilic gives us a primer on Kimball-style fact and dimensional modeling:

Before we come up to explain why dimensional modelling is named like that – dimensional, let’s first take a brief tour through some history lessons. In 1996, a man called Ralph Kimball published a book “The Data Warehouse Toolkit”, which is still considered a dimensional modelling “Bible”. In his book, Kimball introduced a completely new approach to modelling data for analytical workloads, the so-called “bottom-up” approach. The focus is on identifying key business processes within the organization and modelling these first, before introducing additional business processes.

This is a really good overview of the topic, though I’m saddened that “dimensional bus matrix” didn’t make the cut of things to discuss. Mostly because I like the name “dimensional bus matrix.”

Comments closed

Updates to sp_QuickieStore and sp_PressureDetector

Erik Darling has been busy. FIrst, sp_QuickieStore:

The first thing on the list that I want to talk about is the ability to cycle through all databases that have Query Store enabled.

If you have a lot of databases with it turned on, it can be a real hassle to go through them all looking for doodads to diddle.

Now you can just do this:

Next, sp_PressureDetector:

I added  high-level disk metrics similar to what’s available in other popular scripts to mine. Why? Sometimes it’s worth looking at, to prove you should add more memory to a server so you’re less reliant on disk.

Especially in the cloud, where everything is an absolute hellscape of garbage performance that’s really expensive.

Click through for both sets of updates and thank Erik for his willingness to give so much to the community.

Comments closed

Power BI: Git Integration vs Deployment Pipelines

Marc Lelijveld notes a chemical reaction:

Together with the introduction of Microsoft Fabric a long-awaited feature for Power BI also became available. Git integration now allows us to connect your workspace to a git repository to sync the meta data from your Power BI dataset and report between the workspace and the repository.

At the same time, Power BI deployment pipelines are around. The pipelines help you to move content between workspaces from Development to Test and finally to Production. The question that arises, when to use what? And how do they complement each other, or maybe even clash with each other?

Read on for Marc’s thoughts across these two topics.

Comments closed