2019-06-21 – Curated SQL

Cohen’s d is a measure of effect size for the difference of two means that takes the variance of the population into account. It’s defined as
d = | μ₁ – μ₂ | / σ_pooled
where σ_pooled is the pooled standard deviation over both cohorts.

Read the whole thing.

Comments closed

Comparing Iterator Performance in R

Published 2019-06-21 by Kevin Feasel

Ulrik Stervbo has a performance comparison for for, apply, and map functions in R:

It is usually said, that for– and while-loops should be avoided in R. I was curious about just how the different alternatives compare in terms of speed.
The first loop is perhaps the worst I can think of – the return vector is initialized without type and length so that the memory is constantly being allocated.

The performance of map isn’t great, though the benefits to me are less about performance and more about readability. H/T R-bloggers

Comments closed

Calculating Value Above Replacement

Published 2019-06-21 by Kevin Feasel

Rob Collie explains how to calculate Value Above Replacement:

Let’s say you’re trying to improve the fuel efficiency of your car, and you have a budget of $100 to make those improvements. Furthermore, let’s say that you can only spend your budget on three different kinds of improvements: better tires, better spark plugs, and a better exhaust system.
The simple answer, of course, is to buy the Best of each part – the best Tires, the best Sparkplugs, and the best Exhaust System.
But what if Reality rears its head and rudely insists that the Best Costs More? What if there are a range of options available in each bucket, with varying price AND effectiveness? Then what?

Read on for the solution which has not nearly enough baseball.

Comments closed

An Intro to SQL Server Big Data Clusters

Published 2019-06-21 by Kevin Feasel

Mohammad Darab has a series on Big Data Clusters. Part zero explains what they are:

The *absolute* unique feature that BDCs offer that *no* other company or product offers is: Data Virtualization
The new and enhanced SQL Server 2019 Polybase feature comes with connectors to many different data sources: Oracle, Teradata, Apache Spark, MongoDB, Azure Cosmos DB, and even ODBC connectivity to IBM’s DB2, SAP HANA and excel (see image below)

Part one shows how to set one up:

So far, Microsoft does not have a simple way to create a Big Data Cluster. It’s a bit cumbersome of a process and the learning curve is a bit steep. However, Microsoft is currently working on making it easier to deploy a Big Data Cluster via Notebook in Azure Data Studio and eventually some type of “deployment wizard.” But for now, the only option is to do it the long way.

The series will continue, but check out the setup work.

Comments closed

Replaying Workloads with WorkloadTools

Published 2019-06-21 by Kevin Feasel

Gianluca Sartori shows an example of using the WorkloadTools application to replay a workload, including where the analytics server cannot directly access the production database:

Regardless of the method that you decided to use, at the end of the replays, you will have two distinct sets of tables containing the workload analysis data, sitting in different schemas in the same database or in completely different databases.
WorkloadViewer will let you visualize performance over time, as we have seen for a single workload analysis, but this time it will be able to show you data from both workloads, so that you can compare them.

This sort of production load testing is both important and difficult; WorkloadTools makes it easier.

Comments closed

Troubleshooting Query Performance Changes

Published 2019-06-21 by Kevin Feasel

Erin Stellato walks us through a troubleshooting guide when users complain about poorly-performing queries:

This is tale of troubleshooting…
When you unexpectedly or intermittently encounter a change in query performance, it can be extremely frustrating and troublesome for a DBA or developer. If you’re not using Query Store, a third-party application, or your own method to capture query data, then you probably don’t have information about what query performance looked like when things were good…you just know how it’s running now. I was working with a customer of Tim’s last week that had been chasing this exact problem for, in their words, years. They had recently upgraded to SQL Server 2016, and the problem was still occurring.

Strangely, “blame the network” didn’t appear in Erin’s post, so I don’t know if it’s comprehensive.

Comments closed

Workarounds for Updating Stats on Secondaries

Published 2019-06-21 by Kevin Feasel

Niko Neugebauer wants statistics updates on tables running on readable Availability Group secondary nodes:

Let’s list the basic known details for the possible solution(for the Enterprise Edition of the Sql Server that is):
– We can make the secondary replica readable and read the same data on it. (Not that you should do that by default, but if you really know what you are doing …)
– We can copy our object into the TempDB (yeah, your Multi-TB table is probably not the best candidate for this operation), or maybe into some other writable DB.
– We can write results in the shared folder between the replicas (let’s say in a text file into a File Share)
– We can export the BLOB object of the statistics out of the SQL Server
– We can import the BLOB object of the statistics into the statistics

Read the whole thing.

Comments closed

Hidden Patterns in SQL Agent Outputs

Published 2019-06-21 by Kevin Feasel

Solomon Rutzky has found patterns which cause SQL Agent step outputs not to write any details:

And that’s when it happened: I started getting a value back from QUOTENAME(ORIGINAL_DB_NAME()), and not just the square brackets with nothing between them. The name of a database was now between the square brackets. Why wasn’t that showing up before?

Solomon knows the how but not the why in this case.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Day: June 21, 2019

Using Cohen’s D for Experiments

Comparing Iterator Performance in R

Calculating Value Above Replacement

An Intro to SQL Server Big Data Clusters

Replaying Workloads with WorkloadTools

Troubleshooting Query Performance Changes

Workarounds for Updating Stats on Secondaries

Hidden Patterns in SQL Agent Outputs