Press "Enter" to skip to content

Curated SQL Posts

Explaining Duplicate Indexes

Kevin Hill will be shocked and amazed that I finally linked to him again:

Duplicate indexes are those that exactly match the Key and Included columns.  That’s easy.

Possible duplicate indexes are those that very closely match Key/Included columns.

Why do you care?
Indexes have to be maintained. When I say that, most people immediately think of Reorganizing, rebuilding and updating statistics, and they are not wrong.

Click through for a great explanation of what “duplicate” indexes are, as well as ways to find them. If you’re searching for dupes, I’d recommend a couple blog posts from Kim Tripp as well on whether an index is really a duplicate and how to remove duplicate indexes.

Leave a Comment

Using pdqr for Statistical Uncertainty

Evgeni Chasnovski has a new CRAN package:

I am glad to announce that my latest, long written R package ‘pdqr’ is accepted to CRAN. It provides tools for creating, transforming and summarizing custom random variables with distribution functions (as base R ‘p*()’, ‘d*()’, ‘q*()’, and ‘r*()’ functions). You can read a brief overview in one of my previous posts.

Click through for a description of the package.

Leave a Comment

Using Azure DevOps for Power BI CI/CD

Marc Lelijveld and Ton Swart look at today’s CI/CD options for Power BI:

As a developer we might be used to working with Git repositories, especially in order to have release management in place. Git is well known as a modern version control system. By using Git, you will have a local copy of the code on your machine as well. Based on these local copies, you can continue developing. After you’re finished with your work, you can easily push your local repository to merge with the online (shared) repository. By doing this, only the changes will be pushed and saved in the online repository. In fact, only for the new code there will be a new version created. 

Versioning of Power BI files is a whole different story. Since pbix files are binary files, there is no way of checking-in only the code changes. The process of pushing changes identifies the pbix file as one object which has a new version.

Read on for the state of the art. To be honest, I don’t like the state of the art that much, but that has nothing to do with Marc and Ton’s great article.

Leave a Comment

New Features in Azure Synapse Analytics

James Serra gives us a bullet list of new features in Azure Synapse Analytics:

Almost lost in all the announcements from Ignite was a bunch of amazing new features that were added to the Provisioned Resources/SQL Pool section (read SQLDW functionalities) side of Azure Synapse Analytics (formally called Azure SQL Data Warehouse).

One of the more interesting options is ordered clustered columnstore indexes. That seems like something which would be nice to have on-prem. The segment elimination works on-prem today, but ordering is accidental at best. By that, I mean the way that SQL Server loads data into a CCI—roughly, in the order in which you insert it—is not guaranteed to work that way and could change in the future.

Leave a Comment

Star Schemas and Power BI

Alberto Ferrari explains why star schemas are so important to Power BI:

A common question among data modeling newbies is whether it is better to use a completely flattened data model with only one table, or to invest time in building a proper star schema (you can find a description of star schemas in Introduction to Data Modeling). As coined by Koen Verbeeck, the motto of a seasoned modeler should be “Star Schema all The Things!”

The goal is to demonstrate that a report using a flattened table returns inaccurate numbers, whereas using a star schema turns it into a sound analytical system.

Read on for the example.

Leave a Comment

Things a New DBA Should Learn

John McCormack has three things new DBAs should learn first:

Help, the SQL Server is on fire and my really important process is not working. Why is the server so slow?

This scenario will happen to you. Invariably, the person (often a manager) will stand over you and expect you to knock out one or two lines of t-sql wizardry to get things running along smoothly again. First of all, I should say that in a perfect world, you will ask them to raise a ticket and you will work on it according to its priority against your other tasks. Then you can let them know what the issue was and what can be done to prevent it happening again. But we rarely work in a perfect world. In this scenario, you need one primary line of t-sql to get started.

John’s three points are a really good starting point.

Leave a Comment

Power BI Paginated Reports on the Cheap

John White shows how you can get paginated reports in Power BI at a lower cost:

In our scenario, we have a paginated report that uses a published Power BI dataset as a data source. as of this writing, there is no API call available to render a paginated report on demand, so we will rely on the scheduled subscription capability. in order to minimize the cost of the solution, we want the dedicated capacity to run as little as possible.

The solution will consist of an Azure logic app, and Power BI paginated report scheduling. An Azure logic app uses the same set of actions that a Flow in Power automate does, but is a little more flexible in its permissions model.

Read on for the step-by-step instructions.

Leave a Comment

SQL Server 2019 Installation Enhancements

Aaron Bertrand looks at a couple things you can configure on installation with SQL Server 2019:

One of the long-standing defaults in SQL Server has been the maximum degree of parallelism (MAXDOP), which has always been 0, meaning use (up to) all cores when the engine believes that will help. For many OLTP workloads, 0 is not the optimal setting, and you may want to use a different number depending on the behavior of your workload. I don’t want to belabor the thought process here, but this will be based on settings like the number of cores exposed to SQL Server, whether they are divided into NUMA nodes, and if there are other instances, applications, or services running on the same Windows Server. Microsoft’s guidelines are published here.

Aaron shows us how to set MAXDOP as well as min and max server memory for our new instance. It’s nice to see these types of additions to the setup process—that makes it a bit more likely that the DBA who installs instances only occasionally doesn’t forget to set these afterward.

Leave a Comment

Important Assumptions with Linear Models

Sebastian Sauer takes us through two of the most important assumptions of linear models:

Additivity and linearity as the second most important assumptions in linear models
We assume that \(y\) is a linear function of the predictors. If y is not a linear function of the predictors, we cannot expect the model to deliver correct insights (predictions, causal coefficients). Let’s check an example.

Read on to understand what this means, as well as the most important assumption.

Leave a Comment