Kevin Feasel – Page 827

Columnstore Index Maintenance

Published 2020-07-23 by Kevin Feasel

Ed Pollack continues a series on columnstore indexes:

Like with standard B-tree indexes, a columnstore index may be the target of a rebuild or reorganize operation. The similarities end here, as the function of each is significantly different and worth considering carefully prior to using either.
There are two challenges addressed by columnstore index maintenance:
1. Residual open rowgroups or open deltastores after write operations complete.
2. An abundance of undersized rowgroups that accumulate over time

Read on for the full story.

Comments closed

Quality Azure Data Studio Extensions

Published 2020-07-23 by Kevin Feasel

Randolph West vouches for some Azure Data Studio extensions:

It’s worth mentioning that for the most part Azure Data Studio extensions are extremely lightweight, both in download size and memory usage. Installing this many on SQL Server Management Studio (SSMS) would slow it down dramatically.
Note: not all extensions can be installed from the Extensions pane. For many of them you must visit a website, download the VSIX file and install it manually using the File > Install Extension from VSIX Package menu option. In most cases you can trust extensions from reputable publishers, but always take care.

Randolph has quite a few more extensions than I do, but I can’t say any of those are a bad choice.

Comments closed

Splitting Data with T-SQL

Published 2020-07-23 by Kevin Feasel

Chris Hyde shows a few techniques for splitting out data into training, testing, and validation sets:

We see right away that this method failed horribly as all of the data was placed into the same dataset. This holds true no matter how many times we execute the code, and it happens because the RAND() function is only evaluated once for the whole query, and not individually for each row. To correct this we’ll instead use a method that Jeff Moden taught me at a SQL Saturday in Detroit several years ago – generating a NEWID() for each row, using the CHECKSUM() function to turn it into a random number, and then the % (modulus) function to turn it into a number between 0 and 99 inclusive.

I’d have to test it out, but I’d think you could modify method 3 to include a CROSS APPLY to perform one ABS(CHECKSUM(NEWID()) and get exact counts that way without a temp table.

Comments closed

Neural Network Model Deployment with ONNX

Published 2020-07-22 by Kevin Feasel

Terry McCann gives us a primer on ONNX:

Let me introduce you to ONNX. ONNX or the Open Neural Network eXchange is a runtime which can take a model, that you have trained in PyTorch or Tensorflow and encapsulate it in an ONNX format which is executed on something running the ONNX runtime. This new model can be trained in Python and deployed on an ML.net application, with no need for integration coding.
We have spent a huge amount of time creating different Docker containers for different types of models, the Tensorflow container or the PyTorch container or a container running in our model in Spark, the list goes on. That way of working is kind of becoming defunct. ONNX really breaks that down into a simple standard runtime that you can start working with and you can deploy your model into multiple different environments and ensure that is runs on your database, on your website, on your mobile device and also at the edge.

Terry has a video as well. I like the fact that ONNX exists and also that it’s available in Azure SQL Edge (in part because I want it available on-premises as well).

Comments closed

The Basics of Randomized Response

Published 2020-07-22 by Kevin Feasel

Holger von Jouanne-Diedrich explains how randomized response can protect any single person’s opinion from a pollster while providing insight on the whole population:

So, is there a method to find the respective proportion of people without putting them on the spot? Actually, there is! If you want to learn about randomized response (and how to create flowcharts in R along the way) read on!
The question is how can you get a truthful result overall without being able to attribute a certain answer to any single individual. As it turns out, there is a very elegant and ingenious method, called randomized response. The big idea is to, as the name suggests, add noise to every answer without compromising the overall proportion too much, i.e. add noise to every answer so that it cancels out overall!

Click through for the process. It’s definitely a clever idea.

Comments closed

Python in Power BI Desktop

Published 2020-07-22 by Kevin Feasel

David Eldersveld dives into using Python as an external tool in Power BI:

Why use Python as an external “tool”? Even though Python isn’t a “tool” in the same sense as the “Big 3” community tools focused this month, I want to show how versatile the External Tools feature is. I also want to encourage people to use imagination and also explore how Power BI isn’t really as closed as some people think–at least the data model…
Some of these ideas are not exclusive to Python, but there’s enough variety in the Power BI and data science communities for people to possibly figure out if some of this might be useful within the context of their own environments, skills, and organizations.

David also follows up with a series of sample ideas.

Comments closed

Disk Caching with SQL Server VM Disks in Azure

Published 2020-07-22 by Kevin Feasel

Niko Neugebauer performs some tests:

Microsoft has been extremely clear in the best practices recommendation for the SQL Server workloads on Azure VMs:
– use read caching for the data drives/storage pools
– use no caching for the log drives/storage pools
– use read caching for the temp db drives/storage pools
Sounds simple and direct, isn’t it ?
Let me borrow your attention for the next couple of minutes pointing to some situations where you might want to reconsider the best practices.

But do read on for some important notes.

Comments closed

Finding and Downloading SQL Server Updates

Published 2020-07-22 by Kevin Feasel

Andy Levy combines dbatools and KBUpdate:

Another of Chrissy LeMaire’s (blog | twitter) projects is KBUpdate. Compared to dbatools it’s a pretty compact module, but it’s incredibly useful – it’ll seek out information about KB updates and even download them for you! She’s also rolled these functions into dbatools for convenience, so we don’t need to install or import that module separately.

Read on to see how Andy ties it all together.

Comments closed

Waiting on a SQL Agent Job to Complete

Published 2020-07-22 by Kevin Feasel

Thomas Rushton doesn’t have time to sit around and wait:

You know how it is. You need to wait for a job to complete before moving onto the next one… And, just sometimes, you need to do the same with SQL Server.

Read on for a script which waits for one SQL Agent job to complete before moving on and doing something else.

Comments closed

Latching in SQL Server

Published 2020-07-22 by Kevin Feasel

Dan Jackson explains the concept of latching in SQL Server:

To start with, a basic definition: ‘Latches are lightweight synchronization objects, that are used by the storage engine of SQL Server to protect the internal memory structures’. Compare this with locks in SQL server, which are a transaction level construct to manage concurrency, latches work at the thread level to maintain data integrity within the internal memory structures. They are not exposed outside of the SQL Server Operating System (SQLOS). They are only managed by SQL Server itself, not by users (unlike locks that can be overridden via lock hints or changing isolation level). It is useful to keep in mind that a single transaction can use multiple threads at the same time.

Latching is a funny concept to me, in that I think people say “This must be a latching problem” far more than there actually is, but when there is a proper latching problem, it usually winds up being a pretty big deal.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Author: Kevin Feasel