Press "Enter" to skip to content

Author: Kevin Feasel

Updated SQL Server Diagnostic Queries

Glenn Berry has an updated set of DMV queries for us:

These are my SQL Server Diagnostic Information Queries for June 2020, aka my DMV Diagnostic Queries. They allow you to get a very comprehensive view of the configuration and performance of your SQL Server instance in a short amount of time. There are separate versions of these T-SQL queries for SQL Server 2005 through SQL Server 2019. I also have separate versions for SQL Managed Instance and Azure SQL Database. My diagnostic queries have been used by many people around the world since 2009. I make regular improvements to these queries each month.

This is one of my favorite methods for learning about a new SQL Server instance.

Comments closed

Dynamic Partition Pruning in Apache Spark 3.0

Anjali Sharma walks us through a nice improvement in Spark SQL coming with Apache Spark 3.0:

Partition pruning in Spark is a performance optimization that limits the number of files and partitions that Spark reads when querying. After partitioning the data, queries that match certain partition filter criteria improve performance by allowing Spark to only read a subset of the directories and files. When partition filters are present, the catalyst optimizer pushes down the partition filters. The scan reads only the directories that match the partition filters, thus reducing disk I/O.

However, in reality data engineers don’t just execute a single query, or single filter in their queries, and the common case is that they actually have dimensional tables, small tables that they need to join with a larger fact table. So in this case, we can no longer apply static partition pruning because the filter is on one side of the join, and the table that is more appealing and more attractive to prune is on the other side of the join. So, we have a problem now.

And that’s where dynamic partition pruning comes into play.

Comments closed

Fun with Randomness

Holger von Jouanne-Diedrich takes us through some random thoughts:

Many a student thinks that the first pic was created by some underlying pattern (because of its points clumping together in some areas while leaving others empty) and that the second one is “more” random. The truth is that technically both are not random (but only pseudo-random) but the first resembles “true” randomness more closely while the second is a low-discrepancy sequence.

While coming to the point of pseudo-randomness in a moment “true” randomness may appear to have a tendency to occur in clusters or clumps (technically called Poisson clumping). This is the effect seen (or shall I say heard) in the iPod shuffling function. Apple changed it to a more regular behaviour (in the spirit of the second picture)… which was then perceived to be more random (as with my students)!

Read the whole thing.

Comments closed

Power BI Report Page Access Control

Gilbert Quevauvilliers walks us through access control on Power BI pages:

With the update to the latest version in Power BI Desktop there is now the capability to control which users will see which pages in a report.

This is achieved using a combination of Row Level Security (RLS) and conditional formatting. This makes it a secure way of controlling access for specific users. I will demonstrate how this works below.

This is a clever solution to the problem of access control.

Comments closed

Migrating from Azure SQL DB to an Azure SQL Managed Instance

Eitan Blumin walks us through the process of moving from Azure SQL Database to an Azure SQL Managed Instance:

What we cannot do:

– You cannot backup and restore from Azure SQL DB to a Managed Instance.
– You cannot use Azure SQL DB as a source in Azure Data Migration Service (DMS).
– You cannot use Azure SQL DB as a source in Data Migration Assistant (DMA).
– You cannot use Azure SQL DB as a source in DB Mirroring, AlwaysOn, Transaction Log Shipping, or Replication.
– Implementing a custom T-SQL, SSIS, or ADF (Azure Data Factory) solution will be too complicated, and will require an unacceptable development overhead.

That last one is a bit iffy, though Eitan’s two solutions are going to be easier than a custom solution.

Comments closed

Optimizing Power Query Merges

Chris Webb wants to make your joins in Power Query faster:

The first question I decided to investigate was this:

Does the number of columns in a table affect the performance of a merge?

First of all, I created two identical queries called First and Second that connected to the CSV file, used the first row from the file as the headers, and set the data types to all seven columns to Whole Number.

Click through for the answer to this question. Chris promises a series out of this and I would expect there to be enough content for that.

Comments closed

Pulling GROUP BY Above a Join

Paul White has fun with a SQL Server optimization:

When the optimizer moves a Group By above a Join it has to preserve the semantics. The new sequence of operations must be guaranteed to return the same results as the original in all possible circumstances.

One cannot just pick up a Group By and arbitrarily move it around the query tree without risking incorrect results.

Read on for the demo, including a couple ingenious tricks the optimizer has up its sleeve.

Comments closed

The Power BI Pro’s Guide to Azure Synapse Analytics

Brett Powell gives a quick overview of a new whitepaper:

The Power BI Professional’s Guide to Azure Synapse Analytics, a white paper I wrote describing the Synapse Analytics platform and its benefits and use cases for Power BI professionals, was published a couple weeks ago. This post discusses a few themes from this paper and also shares a couple notes that weren’t included.

There are some interesting notes in here, so check those out and also get ahold of the whitepaper to understand how Power BI relates to the artist formerly known as Azure SQL Data Warehouse.

Comments closed

SQL Server on a Windows Container

Kevin Chant lives dangerously:

In this post I want to cover an interesting Windows Container with SQL Server installed experiment that I did. Because it was fairly involved, and it took a while.

In fact, this is the experiment I was talking about in my recent post about recent Azure Data Studio updates. Which you can read about in detail here.

My general philosophy is to avoid Windows containers at all costs, though I’m glad that there are some more adventurous than I.

Comments closed

Adaptive Query Execution with Spark SQL

Wenchen Fan, Herman von Hoevell, and MaryAnn Xue announce Adaptive Query Execution for Apache Spark 3.0:

Over the years, there’s been an extensive and continuous effort to improve Spark SQL’s query optimizer and planner in order to generate high-quality query execution plans. One of the biggest improvements is the cost-based optimization framework that collects and leverages a variety of data statistics (e.g., row count, number of distinct values, NULL values, max/min values, etc.) to help Spark choose better plans. Examples of these cost-based optimization techniques include choosing the right join type (broadcast hash join vs. sort merge join), selecting the correct build side in a hash-join, or adjusting the join order in a multi-way join. However, outdated statistics and imperfect cardinality estimates can lead to suboptimal query plans. Adaptive Query Execution, new in the upcoming Apache SparkTM 3.0 release and available in the Databricks Runtime 7.0 beta, now looks to tackle such issues by reoptimizing and adjusting query plans based on runtime statistics collected in the process of query execution.

One of the biggest advantages of SQL as a fourth-generation language is that the database engine (whether that be SQL Server, Oracle, or Spark) gets the opportunity to write and re-write the set of operations needed to solve a query to try to find the best path which returns the same result set. These optimizations aren’t perfect, as any query tuner can tell you, but they can go a long way.

Comments closed