Press "Enter" to skip to content

Month: July 2023

T-SQL Tuesday 164 Roundup

Erik Darling is stuck on a feeling:

For this most recent T-SQL Tuesday, I challenged bloggers (using the term challenge weakly here) to think of the last time code they saw made them feel a feeling.

I wasn’t terribly specific about what kind of feelings were in play, and so I kind of expected some negative ones to creep in. Most of the results were overwhelmingly positive.

Click through for Erik’s round-up of entrants.

Comments closed

Testing a Database Restoration

Kevin Hill fixes a problem:

Pain Point: Something bad happened and you need to restore a SQL Server database.

Pain Point you didn’t know you had: The backup files are all corrupt due a problem with the storage subsystem.

A backup is only as good as the last time you tested its restoration. Kevin shows just how easy it is to perform this test using DBATools.

Comments closed

SQL Agent History on Azure SQL Managed Instances

Kenneth Fisher goes back in time:

The defaults for saving SQL Agent Job history are ok (at best), so you should probably check and update them if needed. Sadly, if you are using a Managed Instance this isn’t an option.

SQL Managed Instance currently doesn’t allow you to change any SQL Agent properties because they are stored in the underlying registry values.

That’s a real kick in the pants. Still, Kenneth shows us (via Jovan Popovic) a workaround to store the job history someplace else.

Comments closed

Prev and Next in KQL

Robert Cain lags and leads:

In this post we’ll continue our series on Kusto’s Windowing Functions by covering prev and next. If you’ve not read the introductory post, Fun With KQL Windowing Functions – Serialize and Row_Number, you should do so now as it introduced several important concepts needed to understand how these functions are used.

So what do prev and next do? They allow you to retrieve a value in a column from a previous row, or the next (or upcoming) row. This can be very useful in many situations. For instance, calculating the time between two rows based on a datetime column, or the change in a value from one row to the next.

As always, read on for a series of examples from Robert.

Comments closed

Percentage by Group in R

Steven Sanderson performs a breakdown:

Calculating percentages by group is a common task in data analysis. It allows you to understand the distribution of data within different categories. In this blog post, we’ll walk you through the process of calculating percentages by group using three popular R packages: Base R, dplyr, and data.table. To keep things simple, we will use the well-known Iris dataset.

The Iris dataset contains information about different species of iris flowers and their measurements, including sepal length, sepal width, petal length, and petal width. We will focus on the ‘Species’ column and calculate the percentage of each species in the dataset.

Read on for the three approaches. I think the Tidyverse approach is the easiest to understand in this case, though all three get you to the answer.

Comments closed

Power BI Dataset Data Source Credential Options

Angela Henry puts together a list:

If we are using enterprise-wide datasets, we don’t want Power BI datasets owned by individuals; we want them to be owned by a Service Principle so they aren’t relying on specific individuals when things go sideways (and because we all want to go on vacation at some point). However, it’s not always clear on what credentials will actually be used for our data sources in our datasets when using a Service Principle. In a previous post, I talked about how to set up a service principle to take over a dataset when using data gateways, but one of the pre-requisites I listed was that your data sources needed to be configured with appropriate credentials. That’s where this post comes in.

You essentially have three options for data source credentials, depending on your data source type.

Click through for that list of three options, as well as the trade-offs inherent in each.

Comments closed

Orchestrating Azure Data Explorer Queries via Apache Airflow

Michael Spector does some automation:

Apache Airflow is a widely used task orchestration framework, which gained its popularity due to Python-based programmatic interface – the language of first choice by Data engineers and Data ops. The framework allows defining complex pipelines that move data around different parts, potentially implemented using different technologies.

The following article shows how to setup managed instance of Apache Airflow and define a very simple DAG (direct acyclic graph) of tasks that does the following:

  • Uses Azure registered application to authenticate with the ADX cluster.
  • Schedules daily execution of a simple KQL query that calculates HTTP errors statistics based on Web log records for the last day.

Click through for the process.

Comments closed

New Script: sp_HealthParser

Erik Darling has a new script for us:

I recently found myself in the midst of a client issue that lead me to need a bunch of corroborating data from the system health extended event session.

There’s hardly any good documentation on it, and even fewer well-written queries or resources for parsing data out of it.

So now I’m making it easy for you, because I care about you more that Microsoft does.

Click through for a link to the script and to get an idea of what it provides.

Comments closed

Improving an Extra-Wide Column or Bar Chart

Mike Cisneros goes bar-hopping:

Let’s take a closer look at that extra-tall bar chart. I came across a graph like this one when working with a recent client. They created a visual to compare the quarterly sales revenues for each of 25 different sales associates against their individual sales targets.

Mike then gives us two routes to improving the quality of this visual, one which maintains the bar chart style and one which replaces it with a Cleveland dot plot.

Comments closed

An End-to-End Microsoft Fabric Implementation

Jordan Witcombe takes us through an example:

This blog will walk you through the entire data lifecycle of ingesting data from SharePoint and Azure Blob Storage, through the lakehouse pattern, and finally to the reporting stage – all using Microsoft Fabric.

As we’re well into summer now, festivals can be a great way of spending time with friends and family. But all too often, we each want to see various acts and activities. How can we make this easier? With this dataset I created, it takes a simple Excel / CSV file with acts, stages, who wants to see this, and a personal rating. We will run this through the lakehouse pattern and finish with a curated/gold layer which can be used for reporting directly on top of. For more information on lake layers, you can see Ed’s blog Medallion Architecture: What is it?. So, you can highlight those clashes and keep everyone happy during your festival fun!

Read on for the showcase.

Comments closed