Press "Enter" to skip to content

Curated SQL Posts

The Basics of Table Variables

Jared Poche explains some of the value of table variables:

In having a talk reviewed recently, it was suggested I spend more time defining some of the subject I touched on. It occurred if I should go over (or at least introduce) these ideas during a talk for a SQL Saturday audience, some might find a post on the subject useful. Hence my recent post on key lookups.

Another such topic is table variables. I use table variables frequently at my current job, but they came up very infrequently when I worked at CSS in Microsoft. I remember the conversations about them being very simple at the time, as in, “you should just use temp tables instead.” But there is a lot of utility with table variables, and they could be a useful arrow in your quiver.

Click through for the full story.

Comments closed

Removing Data from Query Store

Erin Stellato shows us how to do some spring cleaning:

Automatic removal of data from Query Store is based on configuration, notably the MAX_STORAGE_SIZE_MBCLEANUP_POLICY (STALE_QUERY_THRESHOLD_DAYS), and SIZE_BASED_CLEANUP_MODE settings.  While these are detailed in another post, it’s worth reiterating that ideally, size-based cleanup never kicks in.  A database’s Query Store should be sized to accommodate N days’ worth of data, where N is set by CLEANUP_POLICY (STALE_QUERY_THRESHOLD_DAYS).  When the size of the Query Store exceeds 90% of MAX_STORAGE_SIZE_MB, and if SIZE_BASED_CLEANUP_MODE is set to AUTO, then clean up will kick in.  This algorithm is not efficient, and it runs single-threaded.  It looks for queries that are infrequently executed/less important and deletes those, one by one, until the size is less than 80% of MAX_STORAGE_SIZE_MB.  Avoid this type of cleanup if at all possible.

There is a better way. And an even better way thanks to some of my colleagues.

Comments closed

Indexes and Sorts

Chad Callihan reminds us that sort order can matter for indexes:

When you’re working on an index for a query ordering by one column in ascending order and another column in descending order, do you have your index created to match? Did you know you can specify ASC or DESC in an index? Let’s go through a scenario where ordering in an index makes a noticeable difference.

This is particularly important for window functions—the optimizer can sometimes be smart enough to recognize that a value is in reverse order and not need to use a sort operator, but as soon as you drop that OVER() clause in, if things aren’t in the exact order they need, you get a sort operator thrown in for free. Or, well, the “your query is now a little bit slower” version of free.

Comments closed

Deploying Custom Docker Images in Azure ML

Tsuyoshi Matsuzaki shows us how to deploy an Azure ML model via custom Docker image:

In my early post, I have showed you how to bring your own custom docker image in training with Azure Machine Learning.
On the contrary, here I’ll show you how to bring custom docker image in model deployment.

In Azure Machine Learning, the base docker image in deployment includes the inferencing assets, such as, Flask server, etc. So you should use AML compliant image for base image, even when you use your own custom docker image.
The list of these maintained AML images is available in https://github.com/Azure/AzureML-Containers .

Read on for an example.

Comments closed

Representing Dates in Power BI: Date or Integer?

Marco Russo and Alberto Ferrari share their take on a classic debate:

A question that is often asked during the design of a Power BI data model is whether it is better to use an Integer or a Datetime column to link a fact table with the Date dimension. Historically, using Integers has always been a better choice in database design. However, Tabular is an in-memory columnar database, and its architecture is quite different from the relational databases we might be used to working with.

Indeed, in Tabular there are no technical differences between using a Datetime or an Integer to create a relationship. The database size, the query speed, and any other technical detail are absolutely identical. Therefore, the choice is not related to technical aspects, but rather on the convenience of the design. Depending on the specific requirements of your model, you might favor one data type against the other. In the most common scenarios, a Datetime proves to be better because it provides more possibilities to compute values on dates without having to rely on relationships. With that said, if your model uses Integers and you do not need to perform calculations on the dates represented in the table, then you can choose the most convenient data type – that is, the one already used in the original data source.

The remaining part of the article aims to prove the previous sentences, and to provide you with the technical details about how we tested the respective performance of the two options.

Click through for Marco and Alberto’s analysis, noting that “date” here does not include time of day, so it would have the same cardinality as the integer date key. This was a more important thing fifteen years ago, before columnstore technologies (like columnstore indexes and VertiPaq) were readily available and that 4-byte integer was considerably smaller than an 8-byte DATETIME.

Comments closed

Finding and Removing Custom Roles, Schemas, and Users from a Database

Thomas Williams wants to go back to square one:

I’m a fan of the built-in database roles like db_datareader to standardise & simplify permissions (sorry Dr. Greg Low!) and recently I needed to do just that in a database created using SQL Server 2000, and remove old defaults and a lot of custom roles, schemas and users.

I wrote the set of queries below to generate scripts to remove non-built-in roles, schemas and users, when compared to the model database on a new SQL Server 2019 server.

After running the script generated by the queries, I added back users and gave appropriate roles (like db_datareader).

Read on for the script.

Comments closed

Deploying Datasets in Azure Analysis Services and Power BI PPU

Gilbert Quevauvilliers continues a series on migrating from Azure Analysis Services to Power BI Premium Per User:

Welcome to part 8, where in this blog post, I am going compare deploying datasets.

For those people who are not exactly sure what deployments are, what this means is when you are using Power BI Desktop and you click on Publish, you are effectively deploying your changes to the Power BI Service (Which could also be a server in the cloud).

In this blog post I will show the differences when completing a deployment from AAS and then PPU.

Read on to see several techniques for deploying for each technology.

Comments closed

Ways to avoid the MERGE Operator

Michael J. Swart has important bullet points:

Aaron Bertrand has a post called Use Caution with SQL Server’s MERGE Statement. It’s a pretty thorough compilation of all the problems and defects associated with the MERGE statement that folks have reported in the past. But it’s been a few years since that post and in the spirit of giving Microsoft the benefit of the doubt, I revisited each of the issues Aaron brought up.

Some of the items can be dismissed based on circumstances. I noticed that:

– Some of the issues are fixed in recent versions (2016+).

– Some of the issues that have been marked as won’t fix have been fixed anyway (the repro script associated with the issue no longer fails).

– Some of the items are complaints about confusing documentation.

– Some are complaints are about issues that are not limited to the MERGE statement (e.g. concurrency and constraint checks).

Spoilers: some + some + some + some is still a lot less than all. Read the whole thing.

Comments closed

An Overview of Amazon Athena

Aveek Das takes us through the basics of Amazon Athena:

Serverless. Since Amazon Athena is offered as a fully managed cloud service, customers do not need to take the pain of installing and maintaining separate infrastructures for this. You can start by logging into the AWS Web console and proceeded to Amazon Athena.

Pay Per Query. You only pay for queries you execute. This is very cost-effective, as you can easily figure out your monthly expenses based on your usage pattern. On average, users pay 5 USD for each terabyte of data scanned. This can be further optimized by creating partitions or compressing your dataset.

Interactive Performance. We do not need to worry about the resources that work behind the scenes. When a query is executed, Athena automatically runs the query in parallel across multiple resources, bringing the results faster.

Read on to see an example of Athena in action.

Comments closed

Azure Functions and (Lack of) F# Support

Jamie Dixon has a shaggy dog tale:

When Azure Functions first came out, F# had pretty good support – templates, the ability to run a .fsx file, cool examples written by Don… Fast forward to 2021 and we have bupkis. I recently wrote a dictionary of amino acid weights that would be perfect as a service: pass in a kmer length and weight, get all hits from amino acids.

I first attempted to create a function app, select a .fsx template, and write the code in my browser. Alas, only .csx scripting is available.

Not to be too cutesy about it, but it would be nice if the product which allows for the execution of functions in a cloud service would support the .NET language which most explicitly embraces the notion of functions. If you feel similarly, there is an open feedback ticket.

Comments closed