Press "Enter" to skip to content

Curated SQL Posts

Azure Synapse Analytics Result Set Caching

Niko Neugebauer takes us through result set caching in Azure SQL Data Warehouse Azure Synapse Analytics:

I just put some result on the output, because as you can imagine there are some certain limits on the amount of the output that will be cached and that will be not. Besides the basic logical stuff, such as having deterministic functions only (functions which output will not be varying depending on the execution), not using System Objects or UDFs (and it seems that scalar UDF inlining is not a part of Azure SQL DW yet), no row-level security or column-level security enabled, the main thing and which seems to be pretty good decision as far as I am concerned – the row size larger than 64KB won’t be cached period.

Read on to see what Niko has learned, including cache performing and limitations. Between this and the data pools in SQL Server Big Data Clusters, Microsoft’s spent some time thinking about data caching in cloud-based versions of SQL Server.

Comments closed

Negative Port Numbers in SQL Server

Jason Brimhall takes us through a bit of a mystery:

Take quick note of the port number I have circled in red. This doesn’t match the original query at all. In fact, it doesn’t come anywhere close to the actual port number. In addition, the port number shown here is a negative value. Obviously a negative port is not correct as TCP/IP ports only range from 0-65535. So what is happening here?

Read on for the answer.

Comments closed

Powershell 7 Release Candidate

Max Trinidad has a nice update for us:

The moment everyone has been waiting for some time is here, PowerShell Release Candidate is available for download. This a “Go Live” release officially supported in production by Microsoft.

Everyone in the Microsoft PowerShell Team, with the help of the community, has done an excellent job with the evolution of this new version of PowerShell. Read all about it on the PowerShell DevBlogs recent post “Announcing the PowerShell 7.0 Release Candidate“.

Make sure to read all previous posts as they perfectly outlined under the “Why is PowerShell 7 so awesome?” section of the release candidate post.

Click through for more details. One of the nice things in this RC is a consistent Out-Gridview experience, so it’s not just for Windows anymore.

Comments closed

A Diagnostic Book for SQL Server

Emanuele Meazzo has a new years’s gift for us:

Welcome to 2020! I wanted to start this year by giving to all my fellow consultants another way to troubleshoot our beloved SQL Servers; I’ve already talked about diagnostic notebooks in the past, and now, since Azure Data Studio has implemented the feature, I wanted to group them into a Diagnostic Book.

As the name implies, a jupyter book is no other than a collection of notebooks (and markdown files) that groups everything in a coherent space, with an index and navigation options alike.

I think this sort of collection of notebooks (a, uh, note-book), if put together well, makes it easier to learn a new environment and understand key problems than a big Scripts.txt file or a folder full of scripts.

Comments closed

Using the Power BI Color Picker

David Eldersveld walks us through the Power BI color picker:

The new color picker allows colors in RGB format in addition to the hex color format that Power BI has used exclusively until now.

The new one also easily allows users to choose from a wider selection of shades and tones. This builds upon the simpler selection of hues and tints in the original.

In case you don’t know what David means, there is an excellent explanation of each term.

Comments closed

Performance Tuning Load of Partitioned Hive Tables on S3 with Spark

Dmitry Tolpeko walks us through a performance problem in Spark:

I have a Spark job that transforms incoming data from compressed text files into Parquet format and loads them into a daily partition of a Hive table. This is a typical job in a data lake, it is quite simple but in my case it was very slow.

Initially it took about 4 hours to convert ~2,100 input .gz files (~1.9 TB of data) into Parquet, while the actual Spark job took just 38 minutes to run and the remaining time was spent on loading data into a Hive partition.

Let’s see what is the reason of such behavior and how we can improve the performance.

Read on to see how.

Comments closed

Repartitioning and Coalescing in Spark

Divyansh Jain contrasts repartitioning and coalescing in Spark:

What is Coalesce?

The coalesce method reduces the number of partitions in a DataFrame. Coalesce avoids full shuffle, instead of creating new partitions, it shuffles the data using Hash Partitioner (Default), and adjusts into existing partitions, this means it can only decrease the number of partitions.

What is Repartitioning?

The repartition method can be used to either increase or decrease the number of partitions in a DataFrame. Repartition is a full Shuffle operation, whole data is taken out from existing partitions and equally distributed into newly formed partitions.

Read on to learn good reasons to use both.

Comments closed

Azure Data Factory Notifications

Rayis Imayev walks us through three different techniques for sending notifications in Azure Data Factory:

While working on data integration projects and using Azure Data Factory as your main orchestration tool will help you to develop strategic forward thinking about your development tasks: to ponder on what your data sources are, point of destinations to land this information into a new data model and transformation steps to shape data from the source to its destination. Just like when you play chess and have to plan ahead several of your next moves.

Along with this structural thinking to develop and execute your data flows, timely notifications of when something goes left or right would give you additional peace of mind.

Something I appreciate in this post is that Rayis contrasts the Azure Data Factory techniques with SSIS methods, giving you a solid base for comparison.

Comments closed

Columnstore Indexes in Azure SQL Database

Niko Neugebauer takes us through the columnstore offerings available in Azure SQL Database:

Almost 2 years ago (22nd of March 2018) in Columnstore Indexes – part 121 (“Columnstore Indexes on Standard Tier of Azure SQL DB”) I have already mentioned that Columnstore Indexes were available in Azure SQL Database in Standard 3 (S3) edition and higher, while people I meet keep on mentioning and believing that in order to get Columnstore Indexes one needs to use Premium editions.

Since that blog post a lot of time has passed and in the mean time we have got new tiers with new generations of provisioned General Purpose tiers (Generation 4, Generation 5, FSv2 Series & M Series) appearing, plus the Serverless Tier and not to forget the very promising Hyperscale tier … besides the Azure SQL Database Managed Instance of course, which has already been generally available for some time and the good old Elastic Pools which were never mentioned in original article.

It sounds like, on the whole, columnstore is a normal part of Azure SQL Database across the board—it’s not a special add-on feature.

Comments closed