2017-04-18 – Curated SQL

Note how we’ve removed the private fields. Getting rid of the shared state automatically decoupled the three methods and made the workflow explicit. Without the shared state, the only way we can carry data around is by using the methods’ arguments and return values. And that is exactly what we did: all three members now explicitly state required inputs and possible outputs in their signatures.

This is the essence of functional programming. With honest method signatures, it’s extremely easy to reason about the code as we don’t need to keep in mind hidden relationships between its different parts. It’s also impossible to mess up with the invocation order. If we try, for example, to put the second line above the first one, the code simply wouldn’t compile:

This is one of many reasons why I’m fond of functional programming.

Comments closed

Understanding The Problem: Churn Edition

Published 2017-04-18 by Kevin Feasel

Emre Yazici points out the importance and difficulty of nailing down good definitions, using bookings churn as an example:

WHEN: Let’s say, we have made our design, constructed a model and obtained a good accuracy. However our model predicts (even with 95% accuracy) the customer who are going to churn in next day! That means our business department have to prevent (somehow, as explained before) those customers to churn in “one day”. Because next day, they will not be our customers. Taking an action to “3000” customers (let’s say) in one day only is impossible. So even our project predicts with very high accuracy, it will not be usefull. This approach also creates another problem: Consider that N months ago, a customer “A” was a happy customer and was working (providing us) with us (let’s say, it is a customer with %100 efficiency – happiness) and tomorrow it will be a customer who is not working with us (a customer with %100 efficiency – happiness). And we can predict the result today. So most probably, the customer has already got the idea to leave from our company in the last day. This is a deadend and we can not prevent the customer to churn at this point – because it is already too late.

So we need to have a certain time limit… Such that we need to be able to warn the business department “M months” before (customer churn) thus they can take action before the customers leave. Here comes another problem, what is the time limit… 2 months, 2.5 months, 3 months…? How do we determine the time, that we need to predict customers churn before (they leave)?

There’s a lot more to a good solution than “I ran a regression against a data set.”

Comments closed

Generating Homoglyphs In R

Published 2017-04-18 by Kevin Feasel

Bob Rudis shows how to create homoglyphs (character sequences which look similar to other character sequences) using a few R packages:

We can try it out with a very familiar domain:
(converted <- to_homoglyph("google.com"))
## [1] "ƍ၀໐|.com"
Now, that’s using all possible homoglyphs and it might not look like google.com to you, but imagine whittling down the list to ones that are really close to Latin character set matches. Or, imagine you’re in a hurry and see that version of Google’s URL with a shiny, green lock icon from Let’s Encrypt. You might not really give it a second thought if the page looked fine (or were on a mobile browser without a location bar showing).

Click through for more details, as well as information on punycode.

Comments closed

When Binomials Converge

Published 2017-04-18 by Kevin Feasel

Mala Mahadevan shows an example of the central limit theorem in action, as a large enough sample from a binomial distribution approximates the normal:

An easier way to do it is to use the normal distribution, or central limit theorem. My post on the theorem illustrates that a sample will follow normal distribution if the sample size is large enough. We will use that as well as the rules around determining probabilities in a normal distribution, to arrive at the probability in this case.
Problem: I have a group of 100 friends who are smokers. The probability of a random smoker having lung disease is 0.3. What are chances that a maximum of 35 people wind up with lung disease?

Click through for the example.

Comments closed

Regions In Management Studio

Published 2017-04-18 by Kevin Feasel

Andy Leonard has a short SSMS tips post about how you can take advantage of regions in your code:

There are a few tools and add-ins that support the creation of “regions” in T-SQL code. SQL Server Management Studio (SSMS) supports one way to separate sections of long-ish T-SQL scripts natively, by using begin/end:

It’s an interesting SSMS trick.

Comments closed

Understanding Probe Residuals

Published 2017-04-18 by Kevin Feasel

Daniel Janik explains what a probe residual is in an execution plan:

A probe residual is important because they can indicate key performance problems that might not otherwise be brought to your attention.

What is a probe residual?

Simply put, a probe residual is an extra operation that must be performed to compete the matching process. Extra being left over things to do.

Click through for an example brought about by implicit conversion.

Comments closed

The Story Of Nick

Published 2017-04-18 by Kevin Feasel

Kenneth Fisher tells the story of where the optimizer’s cost value comes from:

Obviously, it’s an important subject, right? And yet we keep seeing comments about how the cost is in seconds.

And to be fair, it is. It’s an estimate of how many seconds a query would take, if it was running on a developers workstation from back in the 90’s. At least that’s the story. In fact Dave Dustin (t) posted this interesting story today:

The best way to think of cost is as a probabilistic, ordinal, unitless value: 3 might be greater than 2; 1000 is almost certainly greater than 2; and “2 what?” is undefined.

Comments closed

Case-Insensitive Power Query Sorts

Published 2017-04-18 by Kevin Feasel

Cedric Charlier points out a comaprisonCriteria on Table.Sort in Power Query:

Have you already tried to sort a table based on a text field? The result is usually a surprise for most people. M language has a specific implementation of the sort engine for text where upper case letters are always ordered before lower case letters. It means that Z is always before a. In the example (here under), Fishing Rod is sorted before Fishing net.

The classical trick to escape from this weird behavior is to create a new column containing the upper case version of the text that will be used to sort your table, then configure the sort operation on this newly created column. This is a two steps approach (Three steps, if you take into account the need to remove the new column). Nothing bad with this except that it obfuscates the code and I hate that.

Click through to learn a more elegant way of sorting.

Comments closed

T-SQL Tuesday Roundup

Published 2017-04-18 by Kevin Feasel

Koen Verbeeck has his roundup of this month’s T-SQL Tuesday:

I asked the SQL Server community to write about their experience/opinion about the changing world we live in and how it impacts their daily job. The response was overwhelming: we had 30 participants in this months blog party!

Here’s an overview of everyone who participated. Take your time to read their stories, as they are very insightful, interesting or just plain fun to read.

There’s a lot of reading this month, with a well above-average turnout.

Comments closed

Power BI Line Dot Charts

Published 2017-04-18 by Kevin Feasel

Devin Knight continues his Power BI custom visuals series:

In this module you will learn how to use the Line Dot Chart Power BI Custom Visual. The Line Dot Chart gives you the ability to make a more engaging line chart that can actually be animated across time.

This seems more like a fun chart than a useful chart, but I could see it being visually engaging for demonstrating relatively low-frequency events.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Day: April 18, 2017

Minimizing Shared State

Understanding The Problem: Churn Edition

Generating Homoglyphs In R

When Binomials Converge

Regions In Management Studio

Understanding Probe Residuals

What is a probe residual?

The Story Of Nick

Case-Insensitive Power Query Sorts

T-SQL Tuesday Roundup

Power BI Line Dot Charts