Press "Enter" to skip to content

Month: November 2019

Optimizing for Sequential Keys

Milos Radivojevic is excited about OPTIMIZE_FOR_SEQUENTIAL_KEY:

The results show that the feature should be used for tables, where latch convoys already happen or are about to happen. In a not-so-heavy workloads, it brings a small overhead. You can see that inserts with 100 parallel sessions are 16% slower for a table with the optimized sequential key. However, when you hit the convoy, the feature definitely helps.

This graf is critical: if you don’t have high enough concurrency on the table, insertion can be a little slower than otherwise, so don’t go setting this for every table.

Leave a Comment

Creating Azure Data Studio Notebooks Using Powershell

Rob Sewell inverts the “Use Azure Data Studio to create Powershell notebooks” mantra:

This module contains only 3 commands at present

* Convert-ADSPowerShellForMarkdown

This will create the markdown link for embedding PowerShell code in a Text Cell for a SQL Notebook as described in this blog post

* New-ADSWorkBookCell

This command will create a workbook text cell or a code cell for adding to the New-ADSWorkBook command

* New-ADSWorkBook

This will create a new SQL Notebook using the cell objects created by New-ADSWorkBookCell

Click through for an example.

Leave a Comment

Get the Stack Overflow Columnstore Edition Database

Erik Darling has more for you on Stack Overflow + Columnstore:

If you want to download the database, here’s the magnet link for the torrent. I don’t have another means of distributing this; it’s too big of a file.

If you want the GitHub scripts to create and load data, head over here.

In addition, Erik has some quick queries showing table relationships in a world without foreign key constraints:

To get you started exploring the Stack Overflow column store database, here are some queries that show how tables are related.

The two main relationships are User Id, and Post Id.

Quick side note: joining together large columnstore indexed tables? Generally not the best idea.

Leave a Comment

Deploying a Big Data Cluster with Azure Data Studio

Mohammad Darab shows how you can deploy a Big Data Cluster to Azure Kubernetes Service using Azure Data Studio:

A few months ago I posted a blog on deploying a BDC using the built-in ADS notebook. This blog post will go a bit deeper into deploying a Big Data Cluster on AKS (Azure Kubernetes Service) using Azure Data Studio (version 1.13.0). In addition, I’ll go over the pros and cons and dive deeper into the reasons why I recommend going with AKS for your Big Data Cluster deployments.

AKS does make it pretty easy. The toughest part for me was figuring out which instance types were supported—I tried a few which would save me money and they weren’t available. I do like that they added a check to view availability before completing the notebook; that wasn’t in the preview version.

Leave a Comment

Explaining Duplicate Indexes

Kevin Hill will be shocked and amazed that I finally linked to him again:

Duplicate indexes are those that exactly match the Key and Included columns.  That’s easy.

Possible duplicate indexes are those that very closely match Key/Included columns.

Why do you care?
Indexes have to be maintained. When I say that, most people immediately think of Reorganizing, rebuilding and updating statistics, and they are not wrong.

Click through for a great explanation of what “duplicate” indexes are, as well as ways to find them. If you’re searching for dupes, I’d recommend a couple blog posts from Kim Tripp as well on whether an index is really a duplicate and how to remove duplicate indexes.

Leave a Comment

Using pdqr for Statistical Uncertainty

Evgeni Chasnovski has a new CRAN package:

I am glad to announce that my latest, long written R package ‘pdqr’ is accepted to CRAN. It provides tools for creating, transforming and summarizing custom random variables with distribution functions (as base R ‘p*()’, ‘d*()’, ‘q*()’, and ‘r*()’ functions). You can read a brief overview in one of my previous posts.

Click through for a description of the package.

Leave a Comment

Using Azure DevOps for Power BI CI/CD

Marc Lelijveld and Ton Swart look at today’s CI/CD options for Power BI:

As a developer we might be used to working with Git repositories, especially in order to have release management in place. Git is well known as a modern version control system. By using Git, you will have a local copy of the code on your machine as well. Based on these local copies, you can continue developing. After you’re finished with your work, you can easily push your local repository to merge with the online (shared) repository. By doing this, only the changes will be pushed and saved in the online repository. In fact, only for the new code there will be a new version created. 

Versioning of Power BI files is a whole different story. Since pbix files are binary files, there is no way of checking-in only the code changes. The process of pushing changes identifies the pbix file as one object which has a new version.

Read on for the state of the art. To be honest, I don’t like the state of the art that much, but that has nothing to do with Marc and Ton’s great article.

Leave a Comment

New Features in Azure Synapse Analytics

James Serra gives us a bullet list of new features in Azure Synapse Analytics:

Almost lost in all the announcements from Ignite was a bunch of amazing new features that were added to the Provisioned Resources/SQL Pool section (read SQLDW functionalities) side of Azure Synapse Analytics (formally called Azure SQL Data Warehouse).

One of the more interesting options is ordered clustered columnstore indexes. That seems like something which would be nice to have on-prem. The segment elimination works on-prem today, but ordering is accidental at best. By that, I mean the way that SQL Server loads data into a CCI—roughly, in the order in which you insert it—is not guaranteed to work that way and could change in the future.

Leave a Comment

Star Schemas and Power BI

Alberto Ferrari explains why star schemas are so important to Power BI:

A common question among data modeling newbies is whether it is better to use a completely flattened data model with only one table, or to invest time in building a proper star schema (you can find a description of star schemas in Introduction to Data Modeling). As coined by Koen Verbeeck, the motto of a seasoned modeler should be “Star Schema all The Things!”

The goal is to demonstrate that a report using a flattened table returns inaccurate numbers, whereas using a star schema turns it into a sound analytical system.

Read on for the example.

Leave a Comment

Things a New DBA Should Learn

John McCormack has three things new DBAs should learn first:

Help, the SQL Server is on fire and my really important process is not working. Why is the server so slow?

This scenario will happen to you. Invariably, the person (often a manager) will stand over you and expect you to knock out one or two lines of t-sql wizardry to get things running along smoothly again. First of all, I should say that in a perfect world, you will ask them to raise a ticket and you will work on it according to its priority against your other tasks. Then you can let them know what the issue was and what can be done to prevent it happening again. But we rarely work in a perfect world. In this scenario, you need one primary line of t-sql to get started.

John’s three points are a really good starting point.