Press "Enter" to skip to content

Curated SQL Posts

Query Compilation Time Matters

David Klee lays out an argument:

SQL Server query developers, listen up! Query execution time is not everything you should be worried about. You need to examine the parse and compilation time for each of your queries too.

Read on for the brunt of David’s argument. There are things you can do about query compilation time, starting with database design (normalize tables and include key constraints, include appropriate indexes, etc.) and continuing with query design (keep queries simple, limit use of functions, limit use of nested views, break complicated queries into multiple steps and use temp tables as intermediaries, etc.). One thing about compilation time, however, is that it doesn’t matter as much if you retain that plan for a while and reuse it a lot.

Comments closed

Case Sensitivity in Power BI

Kurt Buhler is going to raise my blood pressure this morning:

Most Power BI models are case-insensitive, meaning that “Bonk” is the same as “BONK”. However, Power BI data models can also be created as case-sensitive if you create a Direct Lake model in Fabric, or create a new model with external tools and enter a case-sensitive collation property. Two otherwise identical models which differ only in this case-sensitivity may produce different results, even though they’re using the same data, DAX, relationships, and tables.

It’s useful to know how case-sensitivity affects your model and its query results. You should also be able to identify and validate whether your model is case-sensitive. This is particularly important in the following scenarios:

Read on for those scenarios and how you can fix the problem of case sensitivity. My official stance on case sensitivity, by the way, is that applications should be case-insensitive on input but retain casing on output, so “dog” = “Dog” = “DOG” for sorting and querying, but if I saved “Dog” then that’s what should display.

Comments closed

Whitepapers for Oracle and SQL Server in Azure

Kellyn Gorman has been busy:

I’ve been pretty busy with work and travel, but I finally got an official Silk Github repository to publish a couple new white papers and sizing assessment worksheets for customer access.  These are primarily Oracle and SQL Server to Azure focused white papers, but I will be publishing ones on GCP next, to be followed by AI and other database platforms soon.

Click through for links to the documents.

Comments closed

(Near)-Real-Time Analysis with Microsoft Fabric

Reza Rad continues a series on Microsoft Fabric:

Microsoft Fabric offers a workload for real-time solutions. Real-time Analytics can be used for streaming data, such as the data coming from IoT devices. It can be used not only to ingest the data but also to analyze it and use it for other Fabric workloads, such as data science. In this article and video, you will learn what is Real-Time Analytics in Microsoft Fabric and how it works.

Read on for a detailed demo.

Comments closed

A Primer on A/B Testing for Engineers

John Mount performs some testing:

I’d like to discuss a simple variation of A/B testing in an engineering style.
By “an engineering style” I mean:

  • We will work a simulated example to see that the system works as claimed.
  • We will exhibit examples of problems before trying to fix them.
  • We will demonstrate all of the top level claims as calculations, and not delegate these to references.
  • We will leave fundamental math to the references, and not try to re-derive it.

In my opinion far too few A/B testing treatments check soundness, even on simulated data. This makes it easy for such articles to leave out important steps. If a relied on reference omits a step, the derived work may have to do the same.
We will implement the experiment design directly, instead of using a canned power calculator so we have a place to discuss some of the design issues in A/B test design.

This is an excellent dive into the topic and I highly recommend taking the time to read it.

Comments closed

Time Series Stationarity Testing in R

Steven Sanderson isn’t just spinning in place:

Before we delve into the ts_adf_test() function, let’s understand the concept behind it. The Augmented Dickey-Fuller (ADF) test is a crucial tool in time series analysis. It’s like the Sherlock Holmes of time series data, helping us detect whether a series is stationary or not. Stationarity is a fundamental assumption in time series modeling because many models work best when applied to stationary data.

So, why “Augmented”? Well, it’s an extension of the original Dickey-Fuller test that accounts for more complex relationships within the time series data.

Click through to see how you can use the ts_adf_test() function to get a better feel for whether a time series is stationary.

Comments closed

Running Apache Flink Jobs from HDInsight

Sairam Yeturi builds a streaming job:

Could you already complete creating your first Apache Flink® cluster and submit your streaming job on it with HDInsight on AKS?

Well, if you are yet to do that – Let me help you get started.

Click through for a step-by-step walkthrough on how to create a Flink-centric HDInsight cluster on Azure Kubernetes Service and how to create a new job, assuming you have the Jarfile for that job already.

Comments closed

A Primer on Boyce-Codd Normal Form

I have a new video:

In this video, we drill into one of the two most important normal forms, learning what Boyce-Codd Normal Form (BCNF) is, how you can get to BCNF, and a practical example of it. We also learn why I cast so much shade on 2nd and 3rd Normal Forms.

Boyce-Codd Normal Form is one of the two most important normal forms, and I’m pretty happy with the way this video came together to explain how you can get from 1NF into BCNF, as well as the specific benefits this provides.

Comments closed

Monitoring Power BI Gateways with Microsoft Fabric

Tom Martens builds a solution:

No matter what, when the on-premises gateways are not working as expected, data will not refresh, and direct query queries will not succeed. For this reason, I consider it a good idea to track the well-being of these valuable resources. This article describes a solution built with Microsoft Fabric. It’s not necessary to use Fabric, and it’s also not necessary to build a solution on your own. If you want to track the well-being of your on-premises data gateways but do not want to build something, I recommend using the solution by Rui Romano you can find here: https://github.com/RuiRomano/pbigtwmonitor

I built this monitoring solution focusing on the well-being of the on-premises data gateway. I might extend this solution in the future, but for now, it’s about the availability of the on-premises data gateway and the data gateway connections. Availability and analysis will follow during the next weeks.

Click through for Tom’s solution.

Comments closed