2019-09-10 – Curated SQL

I presumed the graph would depict cancellation rates for a set of products, with “Tier 2 with Promotion” at the top, representing the highest cancellation rate. When we get to the data, though, that’s not the case. Rather, the graph shows the inverse metric (retention rate) with Tier 2 + Promo as the bottom line with the lowest retention rate. Eventually I figured this out—but only because I spent time studying the data to make this determination!

Click through for the initial visual as well as a couple of alternatives.

Comments closed

Python and R Data Reshaping

Published 2019-09-10 by Kevin Feasel

John Mount takes us through a couple of data shaping packages:

The advantages of data_algebra and cdata are:
– The user specifies their desired transform declaratively by example and in data. What one does is: work an example, and then write down what you want (we have a tutorial on this here).
– The transform systems can print what a transform is going to do. This makes reasoning about data transforms much easier.
– The transforms, as they themselves are written as data, can be easily shared between systems (such as R and Python).

Let’s re-work a small R cdata example, using the Python package data_algebra.

Click through for the example.

Comments closed

Dealing with Thousands of Databases

Published 2019-09-10 by Kevin Feasel

Andy Levy wraps up a Q&A series on dealing with thousands of databases:

When you started, did you know what your position was going to look like 1 month, 6 months, 1 year, 5 years from then? How accurate has that been so far?
I’ve only been at my current job for about 2 1/2 years, but I can speak to the shorter intervals. I’m going to be intentionally vague in spots here as I don’t want to disclose too much.

And if you’d like to hear Andy talk about migrating 8000 databases, Carlos Chacon and I interviewed Andy for the SQL Data Partners podcast.

Comments closed

Azure Backup for SQL Server VM Pains

Published 2019-09-10 by Kevin Feasel

John Sterrett has run into a few issues with Azure Backup for SQL Server VMs:

Having the ability to backup new databases automatically is taken for granted. So much, that I noticed that Azure Backup for SQL Server VM’s will not automatically backup new databases for you. That’s right. Make sure you remember to go in and detect and select your new database every time you add a database or you will not be able to recover.
Azure Backup for SQL Server VM’s has an interesting feature called Autoprotect. This should automatically backup all your databases for you. Unfortunately, this does not work. Yes, I double-checked by enabling autoprotect for a VM and I added a new database. The database didn’t get backed up so I had to manually add the database.

Some of these seems like it’s easy enough to fix, so hopefully the product gets better over time.

Comments closed

3D Effects in Power BI

Published 2019-09-10 by Kevin Feasel

David Eldersveld shows how you can use orthographic projection in Power BI:

The projection from three coordinates to a 2D plane is achieved by adding the following two measures. Be sure to adjust the column references and what-if parameter names at the top to correspond to your own data.
Here’s my “Ortho x” measure. The initial six bold values are what you’d need to adjust to your own data and parameter names.

David lays out a face, which is pretty neat.

Comments closed

Creating Graph Tables in SQL Server

Published 2019-09-10 by Kevin Feasel

Mala Mahadevan continues a series on graph tables in SQL Server:

I have highlighted in red what SQL Server adds to the table – the two system columns – graph id, which is bigint, and node id, which is nvarchar and stores json, and the unique index to help with queries.
We can also see from constraint type that this table is similar to other relational tables – it can be enabled for replication and can have related delete or update actions defined on it if need be.

This post gives a bit more insight into how graph tables work in SQL Server under the covers.

Comments closed

Batch Mode Normalization

Published 2019-09-10 by Kevin Feasel

Paul White digs into batch mode normalization and its consequences for performance:

I mentioned in the introduction that not all eight-byte data types can fit in 64 bits. This fact is important because many columnstore and batch mode performance optimizations only work with data 64 bits in size. Aggregate pushdown is one of those things. There are many more performance features (not all documented) that work best (or at all) only when the data fits in 64 bits.
In our specific example, aggregate pushdown is disabled for a columnstore segment when it contains even one data value that does not fit in 64 bits. SQL Server can determine this from the minimum and maximum value metadata associated with each segment without checking all the data. Each segment is evaluated separately.

Paul goes deep into the concept, making this well worth your while.

Comments closed

Migrating to a New Hashing Algorithm

Published 2019-09-10 by Kevin Feasel

Randolph West has some ideas about how to remove some of the pain of changing your hashing algorithm for passwords:

Let’s assume that you have inherited a database which stores passwords in cleartext in a column called Password. This is not a good scenario. For one thing, it is a temptation to access information we wouldn’t normally have access to, by using that person’s credentials. For another, due to a human trait called “using a computer,” people reuse passwords all the time. I do it myself and I’m writing a post about security (out of 1,089 items in my 1Password vault, 82 items contain reused passwords).

Randolph has a few ideas, but there’s one which I put into place a long time ago: automatic migration. A system was MD5 (it was a while ago) and I changed it to a better password-hashing algorithm. If the existing user record had a binary the length of an MD5 hash, I’d perform the lookup (to ensure that this was the correct user) and then re-hash using my new algorithm, which had a longer binary length. Over time, people migrated to the new system without any issue. Eventually, when the set of remaining users was small enough, we took more active measures. It wasn’t quite as good as the ideal of forcing resets on everyone, but that was deemed as going overboard for our user base.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Day: September 10, 2019

Choosing the Right Words on Visuals

Python and R Data Reshaping

Dealing with Thousands of Databases

Azure Backup for SQL Server VM Pains

3D Effects in Power BI

Creating Graph Tables in SQL Server

Batch Mode Normalization

Migrating to a New Hashing Algorithm