Press "Enter" to skip to content

Month: October 2019

Date Buckets in SQL Server

Andy Mallon needs to create groups of date ranges in T-SQL:

Here’s where my math nerd-iness comes out… There’s a little trick for “bucketizing” numbers (in this case, turning “Months” into “Month Buckets”):

– Take a number
– Divide it by your bucket size
– Round that number down to a whole number–We’ll call this the “divided number”
– Multiply the “divided number” by the bucket size–This is your bucket floor
– Add one to the “divided number” and then multiply that by your bucket size–This is your bucket ceiling

I was promised that there would be no math.

Comments closed

Creating a Better Index Maintenance Script

Erik Darling, despite being on Team Profiler, has something important to say:

If you’re the kind of person who cares about various caches on your server, like the buffer pool or the plan cache, then you’d wanna measure something totally different. You’d wanna measure how much free space you have on each page, because having a bunch of empty space on each page means your data will take up more space in memory when you read it in there from disk.

You could do that with the column avg_page_space_used_in_percent.

BUT…

Read the whole thing.

Comments closed

Validating Upstream Data Quality with T-SQL

Ed Elliott has a pattern to try:

We can select those rows which we want into our real table which will either already have constraints enabled or we can enable the constraints after we load the data. There are a few problems with this approach, the first problem is that this isn’t very easy to debug. When you realise that your load process only loaded half of the expected rows, why didn’t it load the rest? Other problems include, what do we do with any failed rows – i.e. which rows failed? To answer this we need another query that is the reverse of the “get good rows”. Finally, this is quite a lot of SQL for a file with four columns and three rules, what if the rules are more complex and our data file has lots of columns? I tell you what happens if the rules are more complex: “a mess of SQL jank”, that’s what.

So what instead? 

Read on to learn about the alternative, which is an interesting approach. This is another case where a good SQL-based solution is almost a pivot of a good imperative solution: instead of thinking in rows, focus on columns first and let the rows sort themselves out.

Comments closed

Plan Hashes and Ad-Hoc Workloads

Erin Stellato has an experiment for us:

Borrowing and adapting code from a previous post, Examining the Performance Impact of an Adhoc Workload, we will first create two stored procedures. The first, dbo.RandomSelects, generates and executes an ad hoc statement, and the second, dbo.SPRandomSelects, generates and executes a parameterized query.

Erin then shows how to review query stats and group together executions which are the same save for a change in literals. Read the whole thing.

Comments closed

Being a SQL Server Product Owner

Kevin Chant has an interesting role:

Now, I have had a few people ask me what a Product Owner actually does. Some say that it sounds like an architect role.

In reality, the role is one that’s mainly related to newer working practices like Scrum.

A Product Owners list of responsibilities include talking to all the stakeholders for you team in the business and organise the priorities on your backlog board.

The concept makes sense, though this is the first time I’ve heard of such a role for a tool the engineers use rather than a product offered for sale.

Comments closed

Dealing with NULLs in Java with SQL Server 2019

Niels Berglund covers changes in SQL Server Machine Learning Services around Java code execution:

In the null values post mentioned above, I mentioned that there are differences between SQL Server and Java in how they handle null. So, when we call into Java from SQL Server, we may want to treat null values the same way as we do in SQL Server.

I wrote about this in the SQL Server 2019 Extensibility Framework & Java – Null Values post mentioned above. However, that post was written before SQL Server 2019 CTP 2.5. In CTP 2.5 Microsoft introduced the Java SDK, and certain things changed. Amongst the things that changed is the way we handle nulls when we receive datasets from SQL Server in our Java code.

Read on to learn how it works today.

Comments closed

Plotting Three-Dimensional Linear Models

Sebastian Sauer shows a few techniques for visualizing linear models with two predictors:

Linear models are a standard way of predicting or explaining some data. Visualizing data is not only of didactical value but provides heuristical value too, as demonstrated by Anscombe’s Quartet.

Visualizing linear models in 2D is straightforward, but visualizing linear models with more than one predictor is much less so. The aim of this post is to demonstrate some ways do visualize linear models with more than one predictor, using popular R packages. We will focus on 3D examples, that is, two predictors.

I have a strong bias against 3D visuals because they tend to be so difficult to see clearly. There are times when they’re necessary, though.

Comments closed

Joining RDDs in Spark

Brad Llewellyn takes us through more Spark RDD and DataFrame exercises, including joins:

We can make use of the built-in .join() function for RDDs.  Similar to the .aggregateByKey() function we saw in the previous post, the .join() function for RDDs requires a 2-element tuple, with the first element being the key and the second element being the value.  So, we need to use the .map() function to restructure our RDDs to store the keys in the first element and the original array/tuple in the second element.  After the join, we end up with an awkward nested structure of arrays and tuples that we need to restructure using another .map() function, leading to a lengthy code snippet.

This is a place where DataFrames make so much more sense.

Comments closed

Cardinality Estimation of Table Varaibles with Nullable Columns

Milos Radivojevic takes us through a quick demonstration of a change in SQL Server 2019:

By using the same formula, the estimated number of rows is:

SELECT 0.001992032*1000000
--1992.032000000

This is exactly what we see in the execution plan. OK, that was CL 140, let’s see how SQL Server 2019 handles this simple case.

When we switch to CL 150, the plan and estimations for the c1 column (non-nullable) are the same. However, the estimation for the nullable column is changed!

Read the whole thing.

Comments closed

Simulating ON DELETE CASCADE

Aaron Bertrand has put together a procedure which simulates what a cascading delete would look like given your existing foreign keys:

A user recently asked about an interesting scenario involving tables with foreign key relationships. They wanted to generate the DELETE statements that would allow them to manually delete from child tables first (perhaps in stages), based on criteria they define, and tell them – before performing the deletes – how many rows the operation would affect in each table. They wanted output like this:

DELETE dbo.ChildTable1 WHERE ParentID < <some constant>; -- This would delete 47 row(s).

DELETE dbo.ChildTable2 WHERE ParentID < <some constant>; -- This would delete 14 row(s).
...
DELETE dbo.ParentTable WHERE ID < <some constant>; -- This would delete 11 row(s).

Click through for the solution as well as several caveats.

Comments closed