Press "Enter" to skip to content

Curated SQL Posts

Halloween Problem and Inserts

Jared Poche continues a dive into the Halloween Problem:

I would have expected us to scan the temp table, then have a LEFT JOIN to the base table. The Table Spool is the red flag that we have an issue with the plan, and is frequently seen with Halloween protections.

The index scan on the base table seems to be overkill since we’re joining on the primary key columns (the key lookup isn’t much of a concern). But we’re likely doing the scan because of the spool; it’s SQL Server’s way of getting all relevant records in one place at one time, breaking the normal flow of row mode operation, to make sure we don’t look up the same record multiple times.

Read on to see the execution plan as well as Jared’s fix.

Comments closed

Storing SQL Server Database Files in Blob Storage

Tomaz Kastrun has a wacky idea:

Storing SQL Server database files in Azure blob storage is a great solution for all the databases that are often migrated between instances, servers, virtual machines, or would have been divided between instances. This scenario also has the positive aspect to it, since the ability to create snapshot backups to Azure is seamless.

Following the steps, we will create a Azure Blob storage, where MSSQL Server database files will reside with MSSQL Server running on-prem. Assuming, that you already have the Azure account (if not, you can get a free Azure account), let’s proceed by opening the Windows Terminal in PowerShell mode.

I’m impressed that it worked and could see it being an option for small demo databases, but I can’t imagine performance would be good enough for a production scenario.

Comments closed

Loading Data from S3 into Power BI

Gilbert Quevauvilliers loves a challenge:

I really enjoy a good challenge, and with my customer they have all their data stored in AWS S3. Whilst there is no native connector, I thought there must be a way for me to get the data from AWS S3 into Power BI.

I did a bit of Googling and could not find any suitable solution. I also found and learnt that I could use AWS Athena to query the data living in S3. (I am definitely an expert of have a lot of knowledge in the AWS space. I am fortunate that I have other people who know AWS and were able to setup, configure and give me the details to connect to S3 via AWS Athena)

Below are the steps on how I got this working.

Why they don’t have a proper connector is a bit of a head-scratcher to me given the sheer amount of data stored in S3 and the sheer number of connectors in Power BI.

Comments closed

Creating Power BI Measures via Visual Studio Code

Phil Seamark goes one step further with TOM:

My last blog introduced the idea of using Microsoft Visual Studio Code to work with Power BI Models. For this article, I build on that idea by showing how you can use a TOM based script to automatically generate measures in your model Power BI (or Azure Analysis Services) model.

For simplicity, the example in this blog will do the following:

– Connect to an instance of Power BI Desktop
– Iterate through every Table in the model
– Iterate through every Column in the “current” table from the outer loop
– If the Column is numeric and not hidden, create a simple [Sum of <column>] measure

Read on for demonstration code and a walkthrough of the process.

Comments closed

Ignoring security_error_ring_buffer_recorded Events

Erin Stellato recommends putting the system health extended event on a diet:

In Aaron’s post he refers to the security_error_ring_buffer_recorded event as “unactionable noise”, and frankly I can’t think of a better term for it.  I’ve never used it to troubleshoot it any authentication/security issue, and I’m very confident that Microsoft isn’t using it in its current state either.  In terms of the volume of that event, for the new client system that I looked at recently the five system_health files covered about four (4) hours of time.  Of the 1,851,741 million events captured in that time frame, the security_error_ring_buffer_recorded event showed up 1,838,882 times.  That’s 99.3% of the events…absolute noise.

Erin shows you how to turn this off and get rid of a mess of unhelpful messages.

Comments closed

Adding a Last Updated Time to Power BI Reports

Ed Hansenberry adds useful information to a Power BI report:

It is often useful to tell your users when the report refreshed so they have some idea of how current, or stale, the data is. Unfortunately, you cannot just add a TODAY() formula to your report as that will change each time they open the report. This method will give you the refresh time stamp and compensate for Daylight Savings Time.

This frustrates me a bit—the time when the report was generated is really easy to do in Reporting Services and is certainly a good practice to follow when building reports, and yet there’s a multi-step process involving writing M code to do something which ought to be trivial.

Comments closed

Dates and Timestamps in Spark 3.0

Maxim Gekk, et al, look at the different date and time data types in Apache Spark 3.0:

The definition of a Date is very simple: It’s a combination of the yearmonth and day fields, like (year=2012, month=12, day=31). However, the values of the year, month and day fields have constraints, so that the date value is a valid day in the real world. For example, the value of month must be from 1 to 12, the value of day must be from 1 to 28/29/30/31 (depending on the year and month), and so on.

These constraints are defined by one of many possible calendars. Some of them are only used in specific regions, like the Lunar calendar. Some of them are only used in history, like the Julian calendar. At this point, the Gregorian calendar is the de facto international standard and is used almost everywhere in the world for civil purposes. It was introduced in 1582 and is extended to support dates before 1582 as well. This extended calendar is called the Proleptic Gregorian calendar.

Starting from version 3.0, Spark uses the Proleptic Gregorian calendar, which is already being used by other data systems like pandas, R and Apache Arrow. Before Spark 3.0, it used a combination of the Julian and Gregorian calendar: For dates before 1582, the Julian calendar was used, for dates after 1582 the Gregorian calendar was used. This is inherited from the legacy java.sql.Date API, which was superseded in Java 8 by java.time.LocalDate, which uses the Proleptic Gregorian calendar as well.

Even in this three-paragraph snippet, you can already get a feeling for how complex working with dates can be. Then throw in the complexities of time and you get a detailed post full of good information.

Comments closed

Building an End-to-End Streaming App with Flink SQL

Jark Wu lays down the guantlet:

Apache Flink 1.11 has released many exciting new features, including many developments in Flink SQL which is evolving at a fast pace. This article takes a closer look at how to quickly build streaming applications with Flink SQL from a practical point of view.

In the following sections, we describe how to integrate Kafka, MySQL, Elasticsearch, and Kibana with Flink SQL to analyze e-commerce user behavior in real-time. All exercises in this blogpost are performed in the Flink SQL CLI, and the entire process uses standard SQL syntax, without a single line of Java/Scala code or IDE installation.

Read on for a demo using only bash and Flink SQL.

Comments closed

Working with SQL Server Configuration Files

Jamie Wick takes us through an underrated part of the SQL Server installer:

The ability to use a parameter file (configurationfile.ini), for automating the installation of SQL Server, has been around for many years. However, each release of SQL Server has had different parameters that could be included in the file. Here are some directions on how to find or create a parameter file, along with the parameter values that are supported by each version of SQL Server.

I appreciate the fact that every installation of SQL Server generates one of these and even points it out to you as you go through the installer wizard. And Jamie has gone a step further by giving us an Excel spreadsheet with all of the available settings and their defaults.

Comments closed

Futureproofing a Power BI Solution

Paul Turley begins a series on doing Power BI the right way:

The purpose of this post is to provide some guidance to help you design Power BI solutions that can survive the journey through these stages with as little “throw-away” design as possible. There will always be prototyping and redesign in any project but if you follow proven design patterns and lessons learned from prior experience, you’ll have a much better chance of building a reporting solution that will endure.

Oh, I should mention this… the difference between item #1 (the simple prototype) and #2 (working proof-of-concept) is that you need to throw away the first one – and anything else that doesn’t lay a foundation that you can build on top of. This is a hard lesson but one that will pay off by helping our teams, sponsors and business stakeholders understand that we can proceed after laying down solid bricks (based on well-defined functional and technical requirements) to build the rest of the solution.

That italicized part was important enough for me to call it out. Far too often we develop proofs of concept which work well enough for a demo, and then the next question is “Great, when will it be in production?”

Comments closed