Press "Enter" to skip to content

Curated SQL Posts

The Lakehouse is (Still) Not Enough

Nikola Ilic needs more than a lakehouse:

In the previous parts of the Data Modeling for mere mortals series, we examined traditional approaches to data modeling, with focus on dimensional modeling and Star schema importance for business intelligence scenarios. Now, it’s time to introduce the concept of the modern data platform.

As usual, let’s take a more tool-agnostic approach and learn about some of the key characteristics of the modern data estate. Please, don’t mind if I use some of the latest buzzwords related to this topic, but I promise to reduce their usage as much as possible. 

Lakehouses are getting closer to being good enough, but the performance needs to be there, especially if you eventually have virtual data warehouses sitting on top of lakehouse data to deal with the need for structured fact-dimensional data for reporting tools.

Comments closed

T-SQL Tuesday 164 Roundup

Erik Darling is stuck on a feeling:

For this most recent T-SQL Tuesday, I challenged bloggers (using the term challenge weakly here) to think of the last time code they saw made them feel a feeling.

I wasn’t terribly specific about what kind of feelings were in play, and so I kind of expected some negative ones to creep in. Most of the results were overwhelmingly positive.

Click through for Erik’s round-up of entrants.

Comments closed

Testing a Database Restoration

Kevin Hill fixes a problem:

Pain Point: Something bad happened and you need to restore a SQL Server database.

Pain Point you didn’t know you had: The backup files are all corrupt due a problem with the storage subsystem.

A backup is only as good as the last time you tested its restoration. Kevin shows just how easy it is to perform this test using DBATools.

Comments closed

SQL Agent History on Azure SQL Managed Instances

Kenneth Fisher goes back in time:

The defaults for saving SQL Agent Job history are ok (at best), so you should probably check and update them if needed. Sadly, if you are using a Managed Instance this isn’t an option.

SQL Managed Instance currently doesn’t allow you to change any SQL Agent properties because they are stored in the underlying registry values.

That’s a real kick in the pants. Still, Kenneth shows us (via Jovan Popovic) a workaround to store the job history someplace else.

Comments closed

Prev and Next in KQL

Robert Cain lags and leads:

In this post we’ll continue our series on Kusto’s Windowing Functions by covering prev and next. If you’ve not read the introductory post, Fun With KQL Windowing Functions – Serialize and Row_Number, you should do so now as it introduced several important concepts needed to understand how these functions are used.

So what do prev and next do? They allow you to retrieve a value in a column from a previous row, or the next (or upcoming) row. This can be very useful in many situations. For instance, calculating the time between two rows based on a datetime column, or the change in a value from one row to the next.

As always, read on for a series of examples from Robert.

Comments closed

Percentage by Group in R

Steven Sanderson performs a breakdown:

Calculating percentages by group is a common task in data analysis. It allows you to understand the distribution of data within different categories. In this blog post, we’ll walk you through the process of calculating percentages by group using three popular R packages: Base R, dplyr, and data.table. To keep things simple, we will use the well-known Iris dataset.

The Iris dataset contains information about different species of iris flowers and their measurements, including sepal length, sepal width, petal length, and petal width. We will focus on the ‘Species’ column and calculate the percentage of each species in the dataset.

Read on for the three approaches. I think the Tidyverse approach is the easiest to understand in this case, though all three get you to the answer.

Comments closed

Power BI Dataset Data Source Credential Options

Angela Henry puts together a list:

If we are using enterprise-wide datasets, we don’t want Power BI datasets owned by individuals; we want them to be owned by a Service Principle so they aren’t relying on specific individuals when things go sideways (and because we all want to go on vacation at some point). However, it’s not always clear on what credentials will actually be used for our data sources in our datasets when using a Service Principle. In a previous post, I talked about how to set up a service principle to take over a dataset when using data gateways, but one of the pre-requisites I listed was that your data sources needed to be configured with appropriate credentials. That’s where this post comes in.

You essentially have three options for data source credentials, depending on your data source type.

Click through for that list of three options, as well as the trade-offs inherent in each.

Comments closed

Orchestrating Azure Data Explorer Queries via Apache Airflow

Michael Spector does some automation:

Apache Airflow is a widely used task orchestration framework, which gained its popularity due to Python-based programmatic interface – the language of first choice by Data engineers and Data ops. The framework allows defining complex pipelines that move data around different parts, potentially implemented using different technologies.

The following article shows how to setup managed instance of Apache Airflow and define a very simple DAG (direct acyclic graph) of tasks that does the following:

  • Uses Azure registered application to authenticate with the ADX cluster.
  • Schedules daily execution of a simple KQL query that calculates HTTP errors statistics based on Web log records for the last day.

Click through for the process.

Comments closed

New Script: sp_HealthParser

Erik Darling has a new script for us:

I recently found myself in the midst of a client issue that lead me to need a bunch of corroborating data from the system health extended event session.

There’s hardly any good documentation on it, and even fewer well-written queries or resources for parsing data out of it.

So now I’m making it easy for you, because I care about you more that Microsoft does.

Click through for a link to the script and to get an idea of what it provides.

Comments closed