2017-02-10 – Curated SQL

Yesterday I published the blog post Dynamically Changing Shapes in Power BI, which seemed to be quite popular in social media. It showed a lot of people were struggling with this issue and that some people had also found (alternative) solutions for it. In the comments of that blog post, you can find a solution proposed by Jason Thomas (blog | twitter). Jason has quite the Power BI & SSRS knowledge and he has found a cool trick to solve our changing images problem. In this blog post, I’m going to explain the solution step-by-step.

Click through to see the solution.

Comments closed

High-Compression JSON With vNext

Published 2017-02-10 by Kevin Feasel

Jovan Popovic gives an example of 25x compression of JSON data using a clustered columnstore index in vNext:

CLUSTERED COLUMNSTORE INDEXES provide extreme data compression in SQL Server and Azure SQL Database. With NVARCHAR(MAX) support in CCI indexes you can use them on your JSON data stored is database and get high 25x compression. Therefore, CCI is a perfect solution if you need to store a large volume of JSON data in your SQL Database.

ContosoDW database is publicly available for download, so you can use this database and the script below to re-create this table and try this in your environment.

I’m curious whether this will also apply to non-JSON data.

Comments closed

Finding Disproportionate Outliers With DAX

Published 2017-02-10 by Kevin Feasel

Rob Collie uses DAX to find disproportionate sentiment on a per-group basis:

My wife loves to travel AND she loves data, so it’s no surprise that she showed me the infographic in question.

“Oh come on, California residents don’t visit the Philippines more often than any other country.” That was my first reaction. (Mexico is the overwhelming #1 destination, basically for every state… because Cancun. And because there are so many Mexican-Americans).

“Ah… but California residents DO visit the Philippines disproportionately more often than they ‘should,’ according to national averages!” That was my dawning second realization (and confirmed by the fine print of the Orbitz article, even though the article’s title suggests otherwise.)

So, how do we do that in DAX? Pretty simply, actually.

I’m mentally working out whether this could be useful in anomaly detection.

Comments closed

Spark Data Frame Checkpoints

Published 2017-02-10 by Kevin Feasel

Jean Georges Perrin introduces checkpoints on Spark data frames:

Basically, I use a checkpoint if I want to freeze the content of my data frame before I do something else. It can be in the scenario of iterative algorithms (as mentioned in the Javadoc) but also in recursive algorithms or simply branching out a data frame to run different kinds of analytics on both.

Spark has been offering checkpoints on streaming since earlier versions (at least v1.2.0), but checkpoints on data frames are a different beast.

This could also be very useful for a quality control flow: perform operation A, and if it doesn’t generate good enough results, roll back and try operation B.

Comments closed

Which Data Types Can Create Statistics?

Published 2017-02-10 by Kevin Feasel

Raul Gonzalez figures out which data types cannot be part of statistics:

Yeah, there you go, all these _WA_Sys_ stats tell me they have been automatically created (there is a flag in sys.stats if you don’t believe me) but I can see there are only 31, where I created 34 columns.

That’s funny, let’s see which data types did get statistics.

The results are pretty interesting.

Comments closed

SQL Server ODBC Driver 13.1 For Linux

Published 2017-02-10 by Kevin Feasel

Meet Bhagdev announces a new version of the SQL Server ODBC driver for Linux:

Added

BCP API support
- You can use functions through the ODBC driver as described here on Linux.
Support for user-defined KeyStoreProvider for Always Encrypted
- You can now user-defined/created AE Column Master Key keystore providers. Check out code samples and more information here.
Ubuntu 16.10 support
- Developed a package Ubuntu 16.10 for an apt-get experience.
Dependency on the platform unixODBC Driver Manager instead of the custom unixODBC-utf16 Driver Manager
- This avoids conflicts with applications/software that depends on the platform unixODBC Driver Manager.

No groundbreaking additions, but there are a couple nice fixes in the update.

Comments closed

Full-Text Search

Published 2017-02-10 by Kevin Feasel

Kendra Little gives the scoop on full-text indexing:

The “dirty little secret” about full-text search indexes is that they don’t help with ‘%blabla%’ predicates.

Well, it’s not a secret, it’s right there in the documentation.

A lot of us get the impression that full-text search is designed to handle “full wildcard” searches, probably just because of the name. “Full-Text Searches” sounds like it means “All The Searches”. But that’s not actually what it means.

Kendra’s take is a bit more optimistic than mine; I’m definitely more inclined to dump text out to a Lucene-based indexing system (like Solr or ElasticSearch), as they’ll typically perform faster and solve problems that full-text cannot. Some of that may just be that I was never very good at full-text indexing, though.

Comments closed

When To Define Clustered Index Columns On Non-Clustered Indexes

Published 2017-02-10 by Kevin Feasel

Kim Tripp explains when to include a clustered index column on a non-clustered index column’s definition:

So, when SHOULD you explictly define the clustering key columns in a nonclustered index? When they ARE needed by the query.

This sounds rather simple but if the column is USED by the query then the index MUST have the column explicitly defined. Yes, I realize that SQL Server will add it… so it’s not necessary NOW but what if things change? (this is the main point!)

One of the more common cases I could think of is multi-part clustered indexes, like on a junction table.

Comments closed

Modifying SSMS Shortcuts

Published 2017-02-10 by Kevin Feasel

Slava Murygin shows how to update shortcuts in Management Studio:

A lot of people in the Internet complain about their version of SSMS “forgot” some hot-key combinations. The oldest complain I remember was about the most useful combination “Ctrl-R”.

The reason why SSMS “forgets” is within code sharing and reusability with other Microsoft development products.
If you have that problem, most probably I have (or had in the past) installed something else from Microsoft, such as Development Studio, Data Tools etc.

It can get annoying when another tool clobbers your expected shortcuts.

Comments closed

Azure Disk Encryption

Published 2017-02-10 by Kevin Feasel

Melissa Coates configures Azure Disk Encryption for an already-existing Azure VM:

As I discussed in my previous blog post, I opted to use Azure Disk Encryption for my virtual machines in Azure, rather than Storage Service Encryption. Azure Disk Encryption utilizes Bitlocker inside of the VM. Enabling Azure Disk Encryption involves these Azure services:

Azure Active Directory for a service principal

Azure Key Vault for a KEK (key encryption key) which wraps around the BEK (bitlocker encryption key)

Azure Virtual Machine (IaaS)

Following are 4 scripts which configures encryption for an existing VM. I initially had it all as one single script, but I purposely separated them. Now that they are modular, if you already have a Service Principal and/or a Key Vault, you can skip those steps. I have my ‘real’ version of these scripts stored in an ARM Visual Studio project (same logic, just with actual names for the Azure services). These PowerShell templates go along with other ARM templates to serve as source control for our Azure infrastructure.

The Powershell scripts are straightforward and clear, so check them out.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28

Day: February 10, 2017

More Dynamically Changing Shapes In Power BI

High-Compression JSON With vNext

Finding Disproportionate Outliers With DAX

Spark Data Frame Checkpoints

Which Data Types Can Create Statistics?

SQL Server ODBC Driver 13.1 For Linux

Added

Full-Text Search

When To Define Clustered Index Columns On Non-Clustered Indexes

Modifying SSMS Shortcuts

Azure Disk Encryption