Press "Enter" to skip to content

Curated SQL Posts

Optimize for Unknown with Inline Table-Valued Functions

Koen Verbeeck hits on a strange case:

Turns out SQL Server used a plan with a hash join in the fast query, and a nested loop in the slow query. Due to SQL Server also wildly using incorrect estimates, the nested loops performs really poorly. Quite similar to parameter sniffing with stored procedures. Erik Darling has written a great article about it: Inline Table Valued Functions: Parameter Snorting.

The thing is, in contrast to scalar functions or multi-statement table-valued functions, the iTVF should have better performance because it will be expanded into the calling query. This way, SQL Server can use “more correct” estimates and create a plan for each different parameter. Well, today was not that day.

Read on for details on how Koen performed troubleshooting and the solution.

Comments closed

Understanding the Size of a Power BI File

Lazaros Viastikopoulos explains that finding the size of a Power BI file isn’t quite as easy as you’d think:

As you can see from the above screenshot, the size of the PBIX file is 1.2 MB (1,238 KB) by looking at the size in file explorer. Also, when we look at all the files we are ingesting into Power BI, the total size comes to 2.8 MB. However, when we ingest data into Power BI which is processed to memory, it can be bigger than the PBIX file size due to some compression that occurs and various other elements which again, are not displayed through the disk.

So, when we are getting various errors notifying us that our Power BI file is too large and we see that it is under the 1GB restriction on disk, it can leave us scratching our head. But, as I mentioned above due to compression and various other elements, once data is processed to memory it can be larger then what we see on disk, but still below the total size of the data source we are ingesting due to the compression.

Click through for information on two tools you can use to determine the in-use size.

Comments closed

Finding Skew in a Spark DataFrame

Landon Robinson walks us through skew in Spark DataFrames:

Ignoring issues caused by skew can be worth it sometimes, especially if the skew is not too severe, or isn’t worth the time spent for the performance gained. This is particularly true with one-off or ad-hoc analysis that isn’t likely to be repeated, and simply needs to get done.

However, the rest of the time, we need to find out where the skew is occurring, and take steps to dissolve it and get back to processing our big data. This post will show you one way to help find the source of skew in a Spark DataFrame. It won’t delve into the handful of ways to mitigate it (repartitioning, distributing/clustering, isolation, etc) (but our new book will), but this will certainly help pinpoint where the issue may be.

Click through to learn more.

Comments closed

Cloning Delta Lakes

Burak Yavuz and Pranav Anand show us how to clone Delta Lakes:

Clones are replicas of a source table at a given point in time. They have the same metadata as the source table: same schema, constraints, column descriptions, statistics, and partitioning. However, they behave as a separate table with a separate lineage or history. Any changes made to clones only affect the clone and not the source. Any changes that happen to the source during or after the cloning process also do not get reflected in the clone due to Snapshot Isolation. In Databricks Delta Lake we have two types of clones: shallow or deep.

Read on to learn the differences, as well as a few useful scenarios.

Comments closed

The Merry-Go-Round Scan

David Fowler covers one of the best ways of optimizing frequent scans of large amounts of data:

As we all know, full table scans can be very expensive, poor old SQL is forced to read every single row in a table (of course that doesn’t always mean that it’s a bad choice for SQL).

Lets assume we’ve got a table scan happening that results in 1,000,000 page reads, that’s quite a bit of work for SQL to do. Now imagine another query comes in and needs to scan the same table, that’s also going to need to do 1,000,000 reads to get the data that it needs. If this table happens to be frequently accessed, this is soon going add up.

There’s a clever solution which tends to work better and better as you have more and more queries scanning the table.

Comments closed

The Value of FileTable

Louis Davidson shows us a good scenario for SQL Server’s FileTable:

Now, my pictures are very manageable and searchable using SQL Server. The fact that the names are very standardized also helps me with searching the names in a file folder (I copy all of the files from the share into a Dropbox folder so I have access to the files everywhere I am as well.)

And additional cool thing is that even on my Express Edition local server, the filetable data is backed up with the database, so when I backup the database, it backs up my pictures in addition to the textual data. That extra backup, along with the copy of the files in Dropbox give me extra relief that I have a decent backup if something gets corrupted.

With just a little bit of configuration, and a little bit of code, I have a simple database of my images that I can search using T-SQL in a manner that is a lot more powerful that what I can reasonably do using File Explorer, along with the ability to keep history of how the pictures are used. This would be very useful for any photographer with a stock of pictures that they want to search and keep history of, even if it is just baby pictures…of the puppy.

Click through for the example. I think FileTable is one of the most underutilized features from SQL Server 2012.

Comments closed

Power BI Icons for Diagrams.Net

Marc Lelijveld has some icons for us:

Previously, I used a simple PowerPoint slide when I drafted technical solution proposals. This took me a whole lot of time by copy-pasting all the images, make it look nice and connect the dots together. While tools like diagrams.net are built for this purpose, I always stuck with PowerPoint as there were no icons for all Power BI objects in this tool. Until now!

The online tool Diagrams.net allow you to quickly draft your solution architecture by dragging and dropping icons on a white canvas and easily connecting the dots together.

I’ve been a big fan of diagrams.net (nee draw.io), so thank you Marc for putting this together.

Comments closed

Azure Synapse Analytics Query Options

James Serra has a breakdown of what can query what in Azure Synapse Analytics:

The public preview version of Azure Synapse Analytics has three compute options and four types of storage that it can access (mentioned in my blog at SQL on-demand in Azure Synapse Analytics). This gives twelve possible combinations of querying data. Not all of these combinations currently are supported and some have a few quirks of which I list below.

Read on for a table which breaks down current functionality as well as expected GA functionality.

Comments closed

Query Folding and the Power BI Dataflows Enhanced Compute Engine

Matthew Roche dives into Power BI’s enhanced compute engine:

I’ve been seeing more questions lately about the dataflows enhanced compute engine in Power BI. Although I published a video overview and although there is feature coverage in the Power BI documentation, there are a few questions that I haven’t seen readily answered online.

A lot of these questions can be phrased as “what happens when I turn on the enhanced compute engine for my Power BI Premium capacity?”

Most of my responses start with the phrase “it depends” – so let’s look at some of the factors the answer depends on.

Click through for those factors.

Comments closed