Kevin Feasel – Page 1026

Simplifying Columnstore

Published 2019-06-27 by Kevin Feasel

Monica Rathbun takes us through concepts behind columnstore indexes:

Now, I admit when these first were introduced in SQL Server 2012 I found them very intimidating (additionally, you couldn’t update them directly). For me, anytime you say columnstore, my mind tends to set off alarms saying wait stay away, this is too complicated. So, in this post I am going to try and simplify the feature for you.
To do that first you need to understand some terminology and the difference between a columnstore index and a row store index (the normal kind we all use). Let’s start with the terminology.

There are some interesting complications around columnstore indexes but for analytical or warehousing queries, they’re excellent.

Comments closed

More Testing of Inline Scalar UDFs

Published 2019-06-27 by Kevin Feasel

Erik Darling makes a FROIDian slip:

The idea behind FROID is that it removes some restrictions around scalar valued functions.
1. They can be inlined into the query, not run per-row returned
2. They don’t force serial execution, so you can get a parallel plan
If your functions already run pretty quickly over a small number of rows, and the calling query doesn’t qualify for parallelism, you may not see a remarkable speedup.

Even in that case, Erik argues that you can still get some benefits from SQL Server 2019 bringing those scalar UDFs inline.

Comments closed

SQL Server 2019 CTP 3.1 Released

Published 2019-06-27 by Kevin Feasel

Anshul Rampal announces CTP 3.1 of SQL Server 2019:

The big data clusters feature continues to add key capabilities for its initial release in SQL Server 2019. This month, the release extends the Apache Spark™ functionality for the feature by supporting the ability to read and write to data pool external tables directly as well as a mechanism to scale compute separately from storage for compute-intensive workloads. Both enhancements should make it easier to integrate Apache Spark™ workloads into your SQL Server environment and leverage each of their strengths. Beyond Apache Spark™, this month’s release also includes machine learning extensions with MLeap where you can train a model in Apache Spark™ and then deploy it for use in SQL Server through the recently released Java extensibility functionality in SQL Server CTP 3.0. This should make it easier for data scientists to write models in Apache Spark™ and then deploy them into production SQL Server environments for both periodic training and full production against the trained model in a single environment.

Click through to learn more about what has changed.

Comments closed

Shortest Path with SQL Server 2019

Published 2019-06-27 by Kevin Feasel

Shreya Verma shows off a new feature in SQL Server 2019 CTP 3.1:

We are expanding the graph database capabilities with several new features. In this blog we discuss one of these features that is now available for public preview in SQL Server 2019, Shortest Path, which can be used to find a shortest path between two nodes in a graph. The shortest path function can also be used to compute a transitive closure or for arbitrary length traversals.

With CTP3.1, we are introducing a new function SHORTEST_PATH, which can be used inside MATCH to find a shortest path between any 2 nodes in a graph or to perform arbitrary length traversals. Users can specify a pattern they would like to search for in the graph using a regular expression style syntax.

I’d be interested in seeing how well it performs. But it’s good to see graph functionality fleshed out a bit more.

Comments closed

Best Practices for Query Store

Published 2019-06-26 by Kevin Feasel

Erin Stellato ties together a number of best practices around Query Store:

I’m a huge fan of Query Store, which regular readers may know, but there’s a need to write a bit more about Query Store best practices. This isn’t a “you must use this feature” post, this is a “here is what you must know if you want to use this feature” post.
I have a lot of content about Query Store, but maybe what’s really important gets lost amongst everything else. Glenn prompted me to write this, after I worked with two customers last week that ran into issues related to their configuration of Query Store. Listed below are the things you must know before you enable Query Store. If you already have Query Store enabled and running, I recommend reviewing the information to ensure you are following best practices.

Click through for the full set of practices and links to additional details.

Comments closed

Lasso and Ridge Regression in Python

Published 2019-06-26 by Kevin Feasel

Kristian Larsen shows off a few regression techniques using Python:

Variables with a regression coefficient equal to zero after the shrinkage process are excluded from the model. Variables with non-zero regression coefficients variables are most strongly associated with the response variable. Therefore, when you conduct a regression model it can be helpful to do a lasso regression in order to predict how many variables your model should contain. This secures that your model is not overly complex and prevents the model from over-fitting which can result in a biased and inefficient model.

Read on for demonstrations.

Comments closed

Building a Big Data Cluster

Published 2019-06-26 by Kevin Feasel

Mohammad Darab continues a series on SQL Server Big Data Clusters in Azure Kubernetes Service:

To kick off the Big Data Cluster “Default configuration” creation, we will execute the following Powershell command:
mssqlctl cluster create
That will first prompt us to accept the license terms. Type y and Enter.

Mohammad takes us through the default installation, which requires only a few parameters before it can go on its merry way.

Comments closed

Identity Inserts: One Table at a Time

Published 2019-06-26 by Kevin Feasel

Bert Wagner shows that you can only insert with IDENTITY_INSERT = ON for one table at a time:

Ok, simple enough to fix: we just need to do what the error message says and SET IDENTITY_INSERT ON for both tables:
SET IDENTITY_INSERT dbo.User_DEV ON; SET IDENTITY_INSERT dbo.StupidQuestions_DEV ON;
And… it still didn’t work:
IDENTITY_INSERT is already ON for table 'IdentityTest.dbo.User_DEV'. Cannot perform SET operation for table 'dbo.StupidQuestions_DEV'.

Click through for the ramifications and your alternative.

Comments closed

Deadlock Check Frequency

Published 2019-06-26 by Kevin Feasel

Dave Bland clarifies how frequently deadlock checks occur:

Because deadlocks happen when two task permanently block each other, without a deadlock, both process will simply block forever. Of course this could never be good in a production system. It is important that these situations be identified and dealt with in some manner. This is where SQL Server database engine steps in, it is frequently searching the lock manager looking for deadlocks.

Click through for the answer.

Comments closed

CONCATENATEX in DAX

Published 2019-06-26 by Kevin Feasel

Alberto Ferrari shows us how the CONCATENATEX function works in DAX:

At this point, extracting the first row from Results which would contain the string to produce in the report, is sufficient. TOPN is the function extracting that row, but there is a major drawback here: in case of a tie, TOPN returns all values involved.
In case TOPN returns multiple rows, it is necessary to show them one after the other so to make it clear that the result is not unique. CONCATENATEX is an iterator that concatenates strings and produces a single string out of a table.

This is a good demonstration of a useful function.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Author: Kevin Feasel