Press "Enter" to skip to content

Curated SQL Posts

In-Memory Columnstore Updates

Niko Neugebauer looks at improvements in SQL Server 2016 SP1 with respect to adding columnstore indexes to memory-optimized tables:

With an incredible Service Pack 1 Microsoft has triumphantly announced that all editions (Standard, Web, Express and even Local) will get the most advanced programming capabilities of Columnstore, In-Memory, Database Snapshot, Compression, Partition & many others, plus that there are some incredible features for the T-SQL (CREATE OR ALTER) and Execution Plan details (Actual Information on processed Rows, used Trace Flags, etc),
but at the same time there are some quite important improvement under the hood that will make you want to use Service Pack 1 for SQL Server 2016 immediately. One of this features is the fast addition of the Columnstore Index to the Memory-Optimised tables. Let take it to the test by restoring a copy of the ContosoRetailDW free database:

These results look nice.

Comments closed

Two-Way T Tests

Mala Mahadevan shows how to write a two-way T test in R and T-SQL:

I can do the same calculation of T value using T-SQL. I cannot calculate p value from TSQL as that comes from a table, but it is possible to look it up. I imported the set of values into a table called WalkingSteps with two columns, walkerAsteps and walkerBsteps. For doing the math on T value the formula stated here may be useful. My T-SQL code is as below

The R code is a bit shorter, although the T-SQL code isn’t bad either.

Comments closed

The Risk Of Custom Indexes

David Klee shows the risk of creating custom indexes on third-party software:

The error file tells you specifically which indexes it does not like. The error file is found at:

C:\Users\(youraccount)\AppData\Local\Temp\vcsUpgrade\vcdb_req.err

You’ll find the error message towards the bottom of the document. My specific item was:

1 [42000](50000) [Microsoft][SQL Server Native Client 11.0][SQL Server]ERROR ! Extra indexes: VPX_EVENT.HFX_VPX_EVENT_Cover01; VPX_STAT_COUNTER.IX_VPX_STAT_COUNTER_STAT; VPX_TASK.HFX_VPX_TASK_Cover01;

For well-maintained third-party vendor software which doesn’t require you to add indexes to support the product at any scale beyond what a developer needs for basic testing, this isn’t an issue.  And if you ever find that piece of software, write the company a note of congratulations for being the first…

Comments closed

Solving Math Problems With T-SQL

Erik Darling shows the Collatz Conjecture in T-SQL:

The only problem in there that could really be written in SQL was the Collatz Conjecture. It states that when you take any number, and if it’s even you divide it by 2, and if it’s odd you multiply it by 3 and add 1, you’ll always eventually end up with 1. Fair enough. There’s probably a use for that out somewhere out there.

Writing it in SQL was super easy, of course. Just throw a CASE expression at it.

I think Erik is a little harsh on SQL as a mathematical language.  Yeah, there are other languages I’d go to first for many types of math problems (R, F#, and Python being examples), but SQL can solve quite a few mathematical problems.

Comments closed

R Visuals In Power BI

Ginger Grant discusses how to display R visuals in Power BI:

I hope that some day that this list becomes much longer, but it is a good start. If your company has lots R visuals and you wish to migrate them to Power BI, chances are some of the libraries you are using are not here. If you are interested in having your library added to the list of 352, go to the Ideas page of Power BI and request that your library be added, as Microsoft I know looks at this page to determine what to release in the future. Someone has requested that igraph be added, and since it hasn’t received a lot of votes yet (hint) it is probably low on the priority list.

Even so, this list does cover a lot of the most commonly used packages.

Comments closed

Memory-Optimized Table Types

Randolph West has a three-part series in which he looks at using memory-optimized table types and table valued parameters to increase application performance.  Part 1 introduces the concept:

In other words, for these short-lived temp tables, there’s not only an added benefit of no longer worrying about referring to tempdb..#table, but you also get a massive performance improvement as well.

Part 2 specifies the scale of performance improvements:

The test is very simple and makes use of a technique that one of my customers uses extensively: writing some rows to a temp table, so that another process can reuse those values as parameters, and perform an action, after which it destroys the temp table.

Unfortunately, the method my customer uses to populate a temp table, does not work on Azure SQL Database, because they explicitly refer to the temp tables with three-part naming (tempdb..#temptable), which is not permitted.

For the sake of the exercise, I will have a process that writes to a data structure and compare the times.

Part 3 repeats the test in Azure SQL Database:

I’m going to use the same WHILE loop again, but instead of a million runs, I’ll do 1000, 10,000 and 100,000, because I’m paying for this instance of Azure SQL Database (I picked a Premium P1, with 125 DTUs) and I’m a cheapskate. I doubt the 125 DTUs is even enough to run a million times for the fourth option.

Even in SQL Server 2014, this was a good use of In-Memory OLTP.  With the improvements in 2016, this becomes a viable option for a lot more workloads.

Comments closed

Stream Computing Platform

Ravi Peri shows how to set up the Stream Computing Platform for .NET (SCP.Net) library and kick off a job:

SCP.Net generates a zip file consisting of the topology DLLs and dependency jars.

It uses Java (if found in the PATH) or .net to generate the zip. Unfortunately, zip files generated with .net are not compatible with Linux clusters.

If you’re interesting in working with a Storm topology while writing .NET code, check this out.

Comments closed

Loading Columnstore Data

I have a post on an issue I had with loading columnstore data:

In this design, I have data coming from the transactional system, undergoing some ETL processing, and going into a staging table on the warehouse.  From there, I perform the remainder of the ETL work and insert into a rowstore table.  This rowstore table has the same attribute names and data types as the columnstore table, but instead of having a clustered columnstore index, it has a standard B-tree index and can have additional non-clustered indexes.  From there, I expose the combination table using a view which simply unions the two sets of data so the application doesn’t have to see rowstore versus columnstore tables.

If you’re doing nightly insertions, results tend to be a lot better.  But if you need close-to-but-not-quite real-time data, there are still ways to solve the problem.

Comments closed

Thinking About The Data Lake

Ust Oldfield gives architectural hints on Azure Data Lake Store:

It is very easy to treat a data lake as a dumping ground for anything and everything. Microsoft’s sale pitch says exactly this – “Storage is cheap, Store everything!!”. We tend to agree – but if the data is completely malformed, inaccurate, out of date or completely unintelligible, then it’s no use at all and will confuse anyone trying to make sense of the data. This will essentially create a data swamp, which no one will want to go into. Bad data & poorly managed files erode trust in the lake as a source of information. Dumping is bad.

This is how you get data swamps (a term which I’m so happy is catching on).  Read the whole thing.

Comments closed