Curated SQL – Page 1099 – A Fine Slice Of SQL Server

Case Classes In Scala

Published 2019-02-14 by Kevin Feasel

Shubham Dangare explains what case classes are in Scala:

Case class is scale way to allow pattern matching on an object without requiring a large amount of boilerplate. All you need to do is add a single case keyword modifier to each class that you want to pattern matching using such modifier makes scala compiler add some syntactic conveniences to your class and compiler add companion object(with the apply method)
Adds factory method with the name of the class this means that for instance, you can write StringValue(“X”) to construct a StringValue object instead of using new StringValue(“X”)

Given how useful case classes are in Spark, it’s good to know how they operate. For more background on the topic, Alessandro Lacava has a post from a few years back describing the topic well.

Comments closed

SSIS Error “Deserializing The Package”

Published 2019-02-14 by Kevin Feasel

Andy Leonard troubleshoots an odd error in SSIS:

Exception deserializing the package “Operation is not valid due to the current state of the object.”. (Microsoft.DataTransformationServices.VsIntegration)
As a professional consultant who has been blogging about SSIS for 12 years and authored and co-authored a dozen books related to Microsoft data technologies, my first response was:
“Whut?!”

That is a reasonable first response. Fortunately, Andy also had a second response which was more helpful in finding the root cause.

Comments closed

Saving An ADF Pipeline As A Template

Published 2019-02-14 by Kevin Feasel

Rayis Imayev shares with us how you can save an Azure Data Factory pipeline as a template:

Azure Data Factory (ADF) provides you with a framework for creating data transformation solutions in the Microsoft cloud environment. Recently introduced Template Gallery for ADF pipelines can speed up this development process and provide you with helpful information to create additional activity tasks in your pipelines.

We naturally long to seek if something standard can be further adjusted. That custom design is like ordering a regular pizza and then hitting the “customize” button in order to add a few toppings of our choice. It would be very impressive then to save this customized “creation” for future ordering. And Azure Data Factory has a similar option to save your custom data transformation solutions (pipelines) as templates and reuse them later.

Click through to see how you can do just that.

Comments closed

No Type Equivalence In M

Published 2019-02-14 by Kevin Feasel

Imke Feldmann notes an oddity in types in Power Query:

But this function will not return any matches. I also tried out a (potentially) slower version using Table.SelectColumns(Types, each [Value] = x[Types]) – but still no match.
What I found particularly frustrating here was, that in some cases, these lookups or filters on type-columns worked.

That behavior seems odd to me. Imke shares a link from Microsoft which explains that the behavior occurs, but the why behind it eludes me.

Comments closed

An Overview of Regression Testing

Published 2019-02-14 by Kevin Feasel

Ust Oldfield gives us a primer on regression testing:

There are a variety of methods and techniques that can be used in the design and execution of regression tests. These are:
– Retest All
– Test Selection
– Test Case Prioritisation

Regression testing is really nice to have in place because it keeps you from looking like a buffoon for accidentally reintroducing a bug you’ve previously quashed.

Comments closed

Saving To Excel From Azure Data Studio

Published 2019-02-14 by Kevin Feasel

Bob Pusateri shows us how you can export to Excel from Azure Data Studio:

In SQL Server Management Studio, there’s no single-step way to save a result set to Excel. Most commonly I will just copy/paste a result set into a spreadsheet, but depending on the size of the result set and the types of data involved, that doesn’t always play nicely.
But Azure Data Studio does it WAY better, trust me. If you want that result set in a spreadsheet, just save it as one and poof – you have an Excel file!

Considering that Excel is the most popular BI tool, it makes sense to support it.

Comments closed

Hiding Work: The Nested Loop Operator

Published 2019-02-14 by Kevin Feasel

Erik Darling explains that the nested loop operator is like a duck: there’s more going on beneath the surface than it lets on:

I’m going to talk about my favorite example, because it can cause a lot of confusion, and can hide a lot of the work it’s doing behind what appears to be a friendly little operator.
Something to keep in mind is that I’m looking at the actual plans. If you’re looking at estimated/cached plans, the information you get back may be inaccurate, or may only be accurate for the cached version of the plan. A query plan reused by with parameters that require a different amount of work may have very different numbers.

I like nested loop joins a lot, but there’s a big difference between a loop running a few dozen times and a loop running a couple hundred thousand times, even if the operator doesn’t show you that immediately.

Comments closed

Solving The Monty Hall Problem With R

Published 2019-02-13 by Kevin Feasel

Miroslav Rajter builds a Monty Hall problem simulator using R:

The original and most simple scenario of the Monty Hall problem is this: You are in a prize contest and in front of you there are three doors (A, B and C). Behind one of the doors is a prize (Car), while behind others is a loss (Goat). You first choose a door (let’s say door A). The contest host then opens another door behind which is a goat (let’s say door B), and then he ask you will you stay behind your original choice or will you switch the door. The question behind this is what is the better strategy?

This is something that puzzled me for a very long time. This is fundamentally a Bayesian problem built around processing new information, and once I understood that, the answer was a lot clearer. H/T R-Bloggers.

Comments closed

Control Table Keys In cdata

Published 2019-02-13 by Kevin Feasel

John Mount announces a new feature in the cdata package:

In our cdata R package and training materials we emphasize the record-oriented thinking and how to design a transform control table. We now have an additional exciting new feature: control table keys.
The user can now control which columns of a cdata control table are the keys, including now using composite keys (that is keys that are spread across more than one column). This is easiest to demonstrate with an example.

Read on for an example of how you can use this.

Comments closed

Using Calendar Tables

Published 2019-02-13 by Kevin Feasel

I have a post up on using calendar tables:

There’s one problem with picking a SQL Saturday in April: Easter and Passover tend to run right around that time, and nobody wants a SQL Saturday on Passover or the day before Easter. Unfortunately, our calendar table doesn’t include holiday information. So let’s add it!

Working with holidays and working with fiscal years versus calendar years are just two of the uses of calendar tables. But they’re the only two that I show.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts