Press "Enter" to skip to content

Curated SQL Posts

Plan Hashes and Ad-Hoc Workloads

Erin Stellato has an experiment for us:

Borrowing and adapting code from a previous post, Examining the Performance Impact of an Adhoc Workload, we will first create two stored procedures. The first, dbo.RandomSelects, generates and executes an ad hoc statement, and the second, dbo.SPRandomSelects, generates and executes a parameterized query.

Erin then shows how to review query stats and group together executions which are the same save for a change in literals. Read the whole thing.

Comments closed

Being a SQL Server Product Owner

Kevin Chant has an interesting role:

Now, I have had a few people ask me what a Product Owner actually does. Some say that it sounds like an architect role.

In reality, the role is one that’s mainly related to newer working practices like Scrum.

A Product Owners list of responsibilities include talking to all the stakeholders for you team in the business and organise the priorities on your backlog board.

The concept makes sense, though this is the first time I’ve heard of such a role for a tool the engineers use rather than a product offered for sale.

Comments closed

Dealing with NULLs in Java with SQL Server 2019

Niels Berglund covers changes in SQL Server Machine Learning Services around Java code execution:

In the null values post mentioned above, I mentioned that there are differences between SQL Server and Java in how they handle null. So, when we call into Java from SQL Server, we may want to treat null values the same way as we do in SQL Server.

I wrote about this in the SQL Server 2019 Extensibility Framework & Java – Null Values post mentioned above. However, that post was written before SQL Server 2019 CTP 2.5. In CTP 2.5 Microsoft introduced the Java SDK, and certain things changed. Amongst the things that changed is the way we handle nulls when we receive datasets from SQL Server in our Java code.

Read on to learn how it works today.

Comments closed

Plotting Three-Dimensional Linear Models

Sebastian Sauer shows a few techniques for visualizing linear models with two predictors:

Linear models are a standard way of predicting or explaining some data. Visualizing data is not only of didactical value but provides heuristical value too, as demonstrated by Anscombe’s Quartet.

Visualizing linear models in 2D is straightforward, but visualizing linear models with more than one predictor is much less so. The aim of this post is to demonstrate some ways do visualize linear models with more than one predictor, using popular R packages. We will focus on 3D examples, that is, two predictors.

I have a strong bias against 3D visuals because they tend to be so difficult to see clearly. There are times when they’re necessary, though.

Comments closed

Joining RDDs in Spark

Brad Llewellyn takes us through more Spark RDD and DataFrame exercises, including joins:

We can make use of the built-in .join() function for RDDs.  Similar to the .aggregateByKey() function we saw in the previous post, the .join() function for RDDs requires a 2-element tuple, with the first element being the key and the second element being the value.  So, we need to use the .map() function to restructure our RDDs to store the keys in the first element and the original array/tuple in the second element.  After the join, we end up with an awkward nested structure of arrays and tuples that we need to restructure using another .map() function, leading to a lengthy code snippet.

This is a place where DataFrames make so much more sense.

Comments closed

Cardinality Estimation of Table Varaibles with Nullable Columns

Milos Radivojevic takes us through a quick demonstration of a change in SQL Server 2019:

By using the same formula, the estimated number of rows is:

SELECT 0.001992032*1000000
--1992.032000000

This is exactly what we see in the execution plan. OK, that was CL 140, let’s see how SQL Server 2019 handles this simple case.

When we switch to CL 150, the plan and estimations for the c1 column (non-nullable) are the same. However, the estimation for the nullable column is changed!

Read the whole thing.

Comments closed

Simulating ON DELETE CASCADE

Aaron Bertrand has put together a procedure which simulates what a cascading delete would look like given your existing foreign keys:

A user recently asked about an interesting scenario involving tables with foreign key relationships. They wanted to generate the DELETE statements that would allow them to manually delete from child tables first (perhaps in stages), based on criteria they define, and tell them – before performing the deletes – how many rows the operation would affect in each table. They wanted output like this:

DELETE dbo.ChildTable1 WHERE ParentID < <some constant>; -- This would delete 47 row(s).

DELETE dbo.ChildTable2 WHERE ParentID < <some constant>; -- This would delete 14 row(s).
...
DELETE dbo.ParentTable WHERE ID < <some constant>; -- This would delete 11 row(s).

Click through for the solution as well as several caveats.

Comments closed

Improving Post-Join Aggregation Performance in Power Query

Imke Feldmann finds some nice performance improvements with aggregating data after a join using Power Query:

When you join a table to another table in Power Query, the UI gives you the option to either expand the columns (default) or aggregate the contents of the joint tables. That’s useful if multiple rows are returned for the rows of the table that has been joined to (left table):

But this method is extremely slow. Compared to “simply” expanding all values to new rows (which took around 5 seconds), the aggregation took around 50 seconds. The automatically generated code uses the “Table.AggregateTableColumn”-function.

Read on to see two separate attempts to speed things up.

Comments closed

Scripting Out Linked Servers with Actual Passwords

Ajay Dwivedi shows how you can script out a linked server creation statement which includes actual passwords:

For moving Logins/Users, Microsoft provided revlogin script which made it easy for migration of logins without need to know about passwords. But, there is no easy approach for migration LinkedServers with the actual password. This is where dbatools cmdlet Copy-DbaLinkedServer becomes very handy. But, what about the situation where we have to script out LinkedServer beforehand.

For this reason, based on the blog post of Antti Rantasaari, and using his code as the base script, I have created a cmdlet Get-LinkedServer with SQLDBATools module which accepts SqlInstance name as a parameter along with -ScriptOut switch, and gives Drop/Create statements for linked servers present on that local/remote SqlInstance.

As a quick note, SQLDBATools is not the same as dbatools.

Comments closed

Incremental Data Migration to Blob Storage

Ginger Daniel has started a series on data migration into Azure Blob Storage:

Part 1 of this article demonstrates how to upload multiple tables from an on-premise SQL Server to an Azure Blob Storage account as csv files.  I covered these basic steps to get data from one place to the other using Azure Data Factory, however there are many other alternative ways to accomplish this, and many details in these steps that were not covered.  For a deep-dive into the details you can start here https://docs.microsoft.com/en-us/azure/data-factory/introduction, and https://docs.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-portal#create-a-pipeline

Part 1 was chock full of information, and it looks like Part 2 will be as well.

Comments closed