July 2020 – Page 6 – Curated SQL

SQL Server Management Studio 18.6 Now GA

Published 2020-07-23 by Kevin Feasel

Drew Skwiers-Koballa announces SQL Server Management Studio version 18.6:

The 18.6 release is the second major release of SSMS in 2020 and packs several high impact changes, including a fix for crashes in database diagrams. Key fixes include:
– Save to XEL file error fix.
– Bacpac file import error fix.
– Database diagrams crash fix.
– Addressed sources of three common application hangs.

Lots of bugfixes in here, but there are a few new things as well.

Comments closed

Columnstore Index Maintenance

Published 2020-07-23 by Kevin Feasel

Ed Pollack continues a series on columnstore indexes:

Like with standard B-tree indexes, a columnstore index may be the target of a rebuild or reorganize operation. The similarities end here, as the function of each is significantly different and worth considering carefully prior to using either.
There are two challenges addressed by columnstore index maintenance:
1. Residual open rowgroups or open deltastores after write operations complete.
2. An abundance of undersized rowgroups that accumulate over time

Read on for the full story.

Comments closed

Downgrading SQL Server on Linux

Published 2020-07-23 by Kevin Feasel

Sreekanth Bandarla wants to roll back cumulative updates on Linux:

Of course you can get this info from SQL or several other ways in Linux. Okay, now we know we got SQL Server 2019 CU5 running on this server to work with. Let’s just assume CU5 broke something in my database and I want to go back to CU4. How do I do that?

Click through to see how to do this for Red Hat (or any system using yum). Debian-based don’t have a downgrade option, but you can use apt-get install mssql-server=[version number] instead.

Comments closed

Splitting Data with T-SQL

Published 2020-07-23 by Kevin Feasel

Chris Hyde shows a few techniques for splitting out data into training, testing, and validation sets:

We see right away that this method failed horribly as all of the data was placed into the same dataset. This holds true no matter how many times we execute the code, and it happens because the RAND() function is only evaluated once for the whole query, and not individually for each row. To correct this we’ll instead use a method that Jeff Moden taught me at a SQL Saturday in Detroit several years ago – generating a NEWID() for each row, using the CHECKSUM() function to turn it into a random number, and then the % (modulus) function to turn it into a number between 0 and 99 inclusive.

I’d have to test it out, but I’d think you could modify method 3 to include a CROSS APPLY to perform one ABS(CHECKSUM(NEWID()) and get exact counts that way without a temp table.

Comments closed

Quality Azure Data Studio Extensions

Published 2020-07-23 by Kevin Feasel

Randolph West vouches for some Azure Data Studio extensions:

It’s worth mentioning that for the most part Azure Data Studio extensions are extremely lightweight, both in download size and memory usage. Installing this many on SQL Server Management Studio (SSMS) would slow it down dramatically.
Note: not all extensions can be installed from the Extensions pane. For many of them you must visit a website, download the VSIX file and install it manually using the File > Install Extension from VSIX Package menu option. In most cases you can trust extensions from reputable publishers, but always take care.

Randolph has quite a few more extensions than I do, but I can’t say any of those are a bad choice.

Comments closed

The Basics of Randomized Response

Published 2020-07-22 by Kevin Feasel

Holger von Jouanne-Diedrich explains how randomized response can protect any single person’s opinion from a pollster while providing insight on the whole population:

So, is there a method to find the respective proportion of people without putting them on the spot? Actually, there is! If you want to learn about randomized response (and how to create flowcharts in R along the way) read on!
The question is how can you get a truthful result overall without being able to attribute a certain answer to any single individual. As it turns out, there is a very elegant and ingenious method, called randomized response. The big idea is to, as the name suggests, add noise to every answer without compromising the overall proportion too much, i.e. add noise to every answer so that it cancels out overall!

Click through for the process. It’s definitely a clever idea.

Comments closed

Neural Network Model Deployment with ONNX

Published 2020-07-22 by Kevin Feasel

Terry McCann gives us a primer on ONNX:

Let me introduce you to ONNX. ONNX or the Open Neural Network eXchange is a runtime which can take a model, that you have trained in PyTorch or Tensorflow and encapsulate it in an ONNX format which is executed on something running the ONNX runtime. This new model can be trained in Python and deployed on an ML.net application, with no need for integration coding.
We have spent a huge amount of time creating different Docker containers for different types of models, the Tensorflow container or the PyTorch container or a container running in our model in Spark, the list goes on. That way of working is kind of becoming defunct. ONNX really breaks that down into a simple standard runtime that you can start working with and you can deploy your model into multiple different environments and ensure that is runs on your database, on your website, on your mobile device and also at the edge.

Terry has a video as well. I like the fact that ONNX exists and also that it’s available in Azure SQL Edge (in part because I want it available on-premises as well).

Comments closed

Disk Caching with SQL Server VM Disks in Azure

Published 2020-07-22 by Kevin Feasel

Niko Neugebauer performs some tests:

Microsoft has been extremely clear in the best practices recommendation for the SQL Server workloads on Azure VMs:
– use read caching for the data drives/storage pools
– use no caching for the log drives/storage pools
– use read caching for the temp db drives/storage pools
Sounds simple and direct, isn’t it ?
Let me borrow your attention for the next couple of minutes pointing to some situations where you might want to reconsider the best practices.

But do read on for some important notes.

Comments closed

Python in Power BI Desktop

Published 2020-07-22 by Kevin Feasel

David Eldersveld dives into using Python as an external tool in Power BI:

Why use Python as an external “tool”? Even though Python isn’t a “tool” in the same sense as the “Big 3” community tools focused this month, I want to show how versatile the External Tools feature is. I also want to encourage people to use imagination and also explore how Power BI isn’t really as closed as some people think–at least the data model…
Some of these ideas are not exclusive to Python, but there’s enough variety in the Power BI and data science communities for people to possibly figure out if some of this might be useful within the context of their own environments, skills, and organizations.

David also follows up with a series of sample ideas.

Comments closed

Waiting on a SQL Agent Job to Complete

Published 2020-07-22 by Kevin Feasel

Thomas Rushton doesn’t have time to sit around and wait:

You know how it is. You need to wait for a job to complete before moving onto the next one… And, just sometimes, you need to do the same with SQL Server.

Read on for a script which waits for one SQL Agent job to complete before moving on and doing something else.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Month: July 2020