Author: Kevin Feasel

The scripts I ran to edit the rest of the databases looked similar to the below:

1

2

ALTER DATABASE msdb MODIFY FILE ( NAME = ‘MSDBDat’ , FILENAME = ‘M:\MSSQL\Data\MSDBDat.mdf’ );

ALTER DATABASE msdb MODIFY FILE ( NAME = ‘MSDBLog’ , FILENAME = ‘L:\MSSQL\Logs\MSDBLog.ldf’ );

Once I finished altering all of my database files to their new locations, I stopped the SQL Server Service in Services. I copied and pasted all MDF and LDF files to their correlated new destinations and then started the SQL Server Service once more.

That’s when I ran into the interesting issue of “Recovery in a Pending state”. Some digging and sleuthing brought me back to my scripts.

Read on for those causes.

Comments closed

Finding Different Casings In Case-Insensitive Columns

Published 2018-08-09 by Kevin Feasel

Louis Davidson walks through different permutations of case in a data set:

The other day, I had a problem with some data that I never dreamed I would ever see. In a case insensitive database, in a table’s column that was case insensitive, the customer was using the data as case sensitive. Firstly, let’s just go ahead and say it. “This was a sucky implementation.” But as is common, in my typical role as a data architect in the data warehousing team, I get to learn all sorts of interesting techniques for finding and dealing with “data” that has been used in “interesting” ways.

What is kind of interesting is actually figuring out what that duplicated data was. The case that I was dealing with wasn’t a kind of useful packed surrogate value, where you may use a base 62 number, with a-z, A-Z and 0-9 as characters. So 1, 2, … , 9, 0, a, b, c, … x, y, z, A, B.. etc. 1A1 is a different value in that sequence than 1a1, and is greater . Neat technique, and one that I have been threatening to develop using a SEQUENCE object, where you can pack in a lot of sequential data in a small number of bytes. No, this wasn’t a useful case such as this, in this case, one value was lower case, another had leading capitals. So perhaps “active customer” and “Active Customer”. Yeah, seriously, they meant different things.

Louis shows some of the nuance required in making this work.

Comments closed

Saving Extended Event Session Data To A Table

Published 2018-08-09 by Kevin Feasel

Matthew McGiffen shows how you can take the results of an Extended Events session and insert them into a table:

You just need to select a destination database connection and table name and the export starts. Be warned that it doesn’t default to the current database connection. I’ve fallen for that and overwritten the data in a table with the same filename on a different SQL instance – whoops!

If the option is greyed out when you open the menu it may be that your event data is still loading. If you look closely in the above screenshot you can see I have over 8 million events captured by this session, so it took a while to load before I was able to export.

There are a few gotchas that Matthew shows, but it’s a useful technique.

Comments closed

Serial Plans And Serial Zones

Published 2018-08-09 by Kevin Feasel

Daniel Hutmacher takes a look at what causes your T-SQL code to use a serial plan or at least go into a zone where parallelism is not an option:

Modifications to table variables

UPDATE, INSERT and DELETE on table variables cause a completely serial plan. SELECT statements, on the other hand, don’t neccessarily.

Scalar functions

Completely serial. Even when they’re used in computed columns in one of the tables. Even when you’re not referencing that actual column.

Not all computed columns generate serial plans – only those with scalar functions.

Read on for a number of other places. It turns out that this set is pretty stable from 2012 through to 2017.

Comments closed

Databricks Delta: Data Skipping And ZORDER Clustering

Published 2018-08-08 by Kevin Feasel

Adrian Ionescu explains a couple of concepts which can help make selective queries with Databricks much faster:

The general use-case for these features is to improve the performance of needle-in-the-haystack kind of queries against huge data sets. The typical RDBMS solution, namely secondary indexes, is not practical in a big data context due to scalability reasons.

If you’re familiar with big data systems (be it Apache Spark, Hive, Impala, Vertica, etc.), you might already be thinking: (horizontal) partitioning.

Quick reminder: In Spark, just like Hive, partitioning works by having one subdirectory for every distinct value of the partition column(s). Queries with filters on the partition column(s) can then benefit from partition pruning, i.e., avoid scanning any partition that doesn’t satisfy those filters.

The main question is: What columns do you partition by?
And the typical answer is: The ones you’re most likely to filter by in time-sensitive queries.
But… What if there are multiple (say 4+), equally relevant columns?

Read the whole thing.

Comments closed

Bayesian Approaches To The Cold Start Problem

Published 2018-08-08 by Kevin Feasel

John Cook explains what you can do with data-driven applications when you don’t yet have the data:

How do you operate a data-driven application before you have any data? This is known as the cold start problem.

We faced this problem all the time when I designed clinical trials at MD Anderson Cancer Center. We used Bayesian methods to design adaptive clinical trial designs, such as clinical trials for determining chemotherapy dose levels. Each patient’s treatment assignment would be informed by data from all patients treated previously.

But what about the first patient in a trial? You’ve got to treat a first patient, and treat them as well as you know how. They’re struggling with cancer, so it matters a great deal what treatment they are assigned. So you treat them according to expert opinion. What else could you do?

Read on for John’s solution.

Comments closed

Propagating The Last Non-Null Value With T-SQL

Published 2018-08-08 by Kevin Feasel

Tomaz Kastrun shows a great use of window functions in T-SQL:

So you have NULL values in your SQL Server table and you want to populate those NULL values with the last non-NULL value, based on a particular order. Once you have only one NULL values encapsulated between two populated values, there are quick and fast solutions. But what if you find a larger gap of NULL values and you want to populate these values as well?

Click through for a partial solution, followed by the real solution.

Comments closed

Building Resilient Microservices

Published 2018-08-08 by Kevin Feasel

Samir Behara has some tips for designing resilient microservices:

While architecting distributed cloud applications, you should assume that failures will happen and design your applications for resiliency. A Microservice ecosystem is going to fail at some point or the other and hence you need to learn embracing failures. Don’t design systems with the assumption that its going to be sunny throughout the year. Be realistic and account for the chances of having rain, snow, thunderstorms and other adverse conditions. In short, design your microservices with failure in mind. Things don’t go as per plan always and you need to be prepared for the worst case scenario.

If Service A calls Service B which in turn calls Service C, what happens when Service B is down? What is your fallback plan in such a scenario?

Can you return a pre-decided error message to the user?
Can you call another service to fetch the information?
Can you return values from cache instead?
Can you return a default value?

Microservices have their drawbacks, but one big advantage is that they tend to be concise enough that you can reason about them more clearly than kitchen sink applications. Planning ahead on potential failure modalities differentiates flaky services from robust services.

Comments closed

Joins When No Join Types Are Valid

Published 2018-08-08 by Kevin Feasel

Hugo Kornelis has a brain-teaser for us:

The query below can be executed in any version of the AdventureWorks sample database. Don’t bother understanding the logic, there is none. It is merely constructed to show how SQL Server handles what appears to be an impossible situation.

1

2

3

4

SELECT          d1.Name, d2.GroupName

FROM            HumanResources.Department AS d1

FULL OUTER JOIN HumanResources.Department AS d2

           ON   d2.DepartmentID            > d1.DepartmentID;

If you look at the descriptions of the various join operators in the Execution Plan Reference, you will see that this query poses the optimizer for what appears to be an insolvable problem: none of the join operators can be used for this query!

But it’s possible, and Hugo explains exactly what happens, as well as places where the optimizer could be better at solving the impossible (or at least marginally difficult).

Comments closed

Writing Long Strings To Output In SSMS

Published 2018-08-08 by Kevin Feasel

Bert Wagner shows us a few techniques for printing long strings in SSMS:

Erik Darling posts one solution to this problem in his T-SQL Tuesday #104 entry (as well as some other problems/solutions for lengthy SQL variables). Specifically he links to a SQL string printing script that will loop through the lengthy variable and print everything while maintaining formatting:

And while I like using that stored procedure on my primary server, I’m too lazy to install it every where I need it.

Instead, I have a couple of go-to solutions that work on all SQL Server instances 2008 forward.

The approach Bert outlines isn’t perfect, but it is definitely interesting and easier to write than the ones which work a bit better.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31