2018-08-07 – Curated SQL

In this article we saw the relationship of the covariance matrix with linear transformation which is an important building block for understanding and using PCA, SVD, the Bayes Classifier, the Mahalanobis distance and other topics in statistics and pattern recognition. I found the covariance matrix to be a helpful cornerstone in the understanding of the many concepts and methods in pattern recognition and statistics.

Many of the matrix identities can be found in The Matrix Cookbook. The relationship between SVD, PCA and the covariance matrix are elegantly shown in this question.

Understanding covariance is critical for a number of statistical techniques, and this is a good way of describing it.

Comments closed

Calculating Cohort Lifetime Value With Excel And R

Published 2018-08-07 by Kevin Feasel

Eleni Markou shows how to calculate the lifetime value of a group of customers using two techniques:

A lot of ink has been spilled in developing various descriptions of the LTV, the majority of which ends up with mathematical formulas that are based on margin (m), retention rate (r) and discount rate (d) like the following (here):

However, this model appears to be not that realistic as it is based on a few quite restrictive assumptions:

Retention is assumed to be constant during the lifetime of a customer, i.e. the probability r of remaining retained remains the same across all months.

An infinite time horizon is assumed when calculating the present value of future cash flows.

The unit economics are supposed to be constant throughout lifetime which leads to a constant contribution margin.

Yet when dealing with an actual company, it easily becomes evident that none of the aforementioned conditions actually hold. Especially in early-stage businesses the size of the time periods across which you would like to calculate the LTV is month – or week – sized while at the same time the retention rate across them can vary significantly as the company’s products evolve quickly.

There’s a lot packed into that article, so give it a read.

Comments closed

T-SQL Formatters

Published 2018-08-07 by Kevin Feasel

Kenneth Fisher links to a set of T-SQL formatters:

I have a hard time working with code that’s so badly formatted I can’t read it. Normally I’d just format it myself, but this was a good dozen pages long. So what did I do? I ran to twitter screaming for help.

#sqlhelp What’s a good free T-SQL formatter?

— Kenneth Fisher (@_@) (@sqlstudent144) May 11, 2018

And I got a lot of responses! Here were the recommendations followed by who recommended them:

My favorite is DevArt’s SQL Complete, which is a paid product, but covers almost perfectly our 30-some page T-SQL formatting guideline.

Comments closed

Alerting On Missed Log Backups

Published 2018-08-07 by Kevin Feasel

Jamie Wick shows us how to build out an alert if we’ve exceeded a threshold for time since the last log backup:

The primary reason most transaction logs start to (unexpectedly) grow is that the transaction log hasn’t been backed up. This is usually caused by one of two situations: the log backup job failed, or the log backup job didn’t start. Most options for backing up SQL transaction logs (3rd party software, SQL Agent job, or maintenance plan) will include an alert/notification mechanism for when the job fails. Where some backup options are lacking is in notifying that a (log) backup that should have been taken, wasn’t. If the backup (or management) software doesn’t include an alert for missing backups, SQL Server (2008 & newer) has a solution. Policy Based Management (PBM) includes a policy (Last Log Backup Date) that will send an email notification when the last backup date/time exceeds a specified threshold.

A good use for a woefully underappreciated tool.

Comments closed

SSMS: Analyze Actual Execution Plan

Published 2018-08-07 by Kevin Feasel

Grant Fritchey shows us something that the SSMS tools team snuck in on us:

One of the many new sets of functionality introduced in SQL Server Management Studio 17 is the new option “Analyze Actual Execution Plan.” If Microsoft continues down this path, there will be a lot of useful functionality at some point. If you haven’t yet looked at Analyze Actual Execution Plan, well, read on.

I hope they do expand this out. I can see it being very beneficial, but it needs to look at a lot more than just cardinality estimations.

Comments closed

Azure SQL Data Warehouse Restore Points

Published 2018-08-07 by Kevin Feasel

Arun Sirpal explains how backups work with Azure SQL Data Warehouse:

The question is how are backups done with Azure SQL DW?

It is very different from Azure SQL DB (which you would expect). Azure SQL DW has a totally different architecture to its classic database counter-part. Restore points are the key here. Automatic ones are taken throughout the day and are kept for seven days only. Worst case scenario is the time between the restore points will be eight hours hence giving an eight hour RPO (Recovery Point Objective).

You can also create manual restore points, as Arun shows.

Comments closed

Implementation Matters: CTEs In Postgres And SQL Server

Published 2018-08-07 by Kevin Feasel

Brent Ozar looks at a couple of places where Postgres and SQL Server differ in implementation details:

In SQL Server, if you write this query:

1

2

3

4

With AllPosts AS (SELECT * FROM StackOverflow.dbo.Posts)

SELECT *

  FROM AllPosts

  WHERE Id = 1;

SQL Server builds a query plan for the entire operation at once, and passes the WHERE clause filter into the CTE. The resulting query plan is efficient, doing just a single clustered index seek.

In Postgres, CTEs are processed separately first, and subsequent WHERE clauses aren’t applied until later. That means the above query works just fine – but performs horribly. You’ll get much better results if you include your filters inside each CTE, like this:

1

2

3

With AllPosts AS (SELECT * FROM StackOverflow.dbo.Posts WHERE Id = 1)

SELECT *

  FROM AllPosts;

That’s less than ideal.

The comments are valuable here as well.

Comments closed

SSL Provider Error: 31 With SQL Server In Docker

Published 2018-08-07 by Kevin Feasel

Andrew Pruski walks us through fixing a connection error:

I recently bought a Dell XPS 13 running Ubuntu 16.04 and ran into an issue when connecting SQL Operations Studio (version 0.31.4) to SQL 2017 CU9 running in a docker container. Other people seem to encountering this issue as well so am posting it so that it may be of some help to someone in the future.

The error generated was: –

System.Data.SqlClient.SqlException (0x80131904): A connection was successfully established with the server, but then an error occurred during the pre-login handshake. (provider: SSL Provider, error: 31)

The full error can be viewed here

Read on for the solution.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Day: August 7, 2018

A Geometric Depiction Of Covariance

Calculating Cohort Lifetime Value With Excel And R

T-SQL Formatters

Alerting On Missed Log Backups

SSMS: Analyze Actual Execution Plan

Azure SQL Data Warehouse Restore Points

Implementation Matters: CTEs In Postgres And SQL Server

SSL Provider Error: 31 With SQL Server In Docker