Press "Enter" to skip to content

Day: January 4, 2017

Why Care About The Data?

Rob Collie explains some of the benefits of Power BI with respect to its ability to dig into data problems:

Many problems/opportunities can ONLY be addressed by making thousands of small behavior changes, on an ongoing basis, out in the “trenches.”  In other words, there are some which CANNOT be addressed by a small number of smart decisions made at the top of the org.  Improved pricing behavior by the sales force for instance.  Improved quality of service by field technicians, for another.

How do you make that happen?  How do you influence a large number of people to behave differently on an ongoing basis, especially when they operate at a distance (both organizationally and physically) from you?

Fancy management consulting firms charge six-seven figures to do this, but with Power BI and/or Power Pivot, it’s no longer necessary to make that outlay.  We’ve seen organizations crack this nut with in-house resources, modest budgets, and short timelines – even though the impact is absolutely enterprise-wide.  It takes some thought and iteration, for sure, but primarily, you “just” need to build a good scorecard.

This is a non-technical post, but I like it a lot because it helps motivate decisions and gives you a good reason to investigate whether your tools are doing the work you need them to do.

Comments closed

Batches And Stored Procedure Creation

Steve Jones has a warning for when you create a stored procedure:

Why is my select code in there? That was designed to be a piece of test code. Shouldn’t the BEGIN..END after the AS define my procedure?

Actually it doesn’t. the procedure doesn’t end until the CREATE PROCEDURE statement is terminated. That termination comes by ending the batch. The CREATE PROCEDURE documentation has this limitation:

The CREATE PROCEDURE statement cannot be combined with other Transact-SQL statements in a single batch.

This means that anything else you have in that batch will be considered as part of the procedure, regardless of BEGIN..END.

Judicious usage of the GO statement can help keep you out of trouble.

Comments closed

Partitioned Views With Polybase

I look at using SQL 2000-style partitioning with a set of external tables:

Using a view, we were able to create a “partitioned” Polybase experience, similar to what we had in SQL Server 2000.  This form of poor man’s partitioning allows us to segment out data sets and query them independently, something which can be helpful when storing very large amounts of data off-site and only occasionally needing to query it.  The thing to remember, though, is that if you store this in Azure Blob Storage, you will need to pull down the entire table’s worth of data to do any processing.

This leads to a concept I first heard from Ginger Grant:  pseudo-StretchDB.  Instead of paying for what Stretch offers, you get an important subset of the functionality at a much, much lower price.  If you do store the data in Azure Blob Storage, you’re paying pennies per gigabyte per month.  For cold storage, like a scenario in which you need to keep data around to keep the auditors happy but your main application doesn’t use that information, it can work fine.  But if you need to query this data frequently, performance might be a killer.

For Polybase tables without the ability to perform external pushdown, coming up with a good partitioning strategy is probably one of the two best ways to improve performance, with creating a Polybase scale-out cluster the other method.

Comments closed

Nested Sets

Nate Johnson explains the nested sets model:

Put another way, the #3 rule is that you should always operate on the tree (CrUD ops) using stored-procedures and/or triggers that encapsulate all the nitty-gritty details of maintaining the correct position values during said insert/update/delete operations.  Of course, somebody is responsible for writing those stored-procs.  Any volunteers?  Easy now, don’t raise your hands all at once!  Generally, this responsibility falls to the DBA(s) or DBDev(s).

The problem at-hand, in my current situation, was that of “moving a sub-tree”, i.e. taking a node and all its descendants, and moving it to place it under another “parent” node.  In some models, and/or in some languages, this is a simple recursive operation.  However, SQL is not spectacular at recursion — after all, we’re working in a relational engine — so let’s try to play to its strengths:

This is a straightforward look at one of the major hierarchical models in relational design.  Well worth a look.

1 Comment

TempDB And Parallelism

Kendra Little looks at cases when a query uses multiple tempdb data files:

As you might guess, things may not always get evenly accessed, even if you have evenly sized tempdb files. One of my queries did a select into a temp table. Although it used all four tempdb files whether or not it went parallel, there were more file_read events against the first tempdb file than against the other four.

It’s an interesting look at this specific question, but also as pedagogical technique.

Comments closed

BatchMode Execution

Sunil Agarwal describes BatchMode execution with columnstore indexes:

You may be wondering what is this magic number 900 rows within a batch? Well, when executing a query in BatchMode, SQL Server allocates a 64k bytes structure to group the rows. The number of rows in this structure can vary between 64 to 900 depending upon number of columns selected. For the example above, there are two columns that are referenced and X marks the rows that qualified in the BatchMode structure shown in the picture below. If SCAN is part of a bigger query execution tree,  the pointer to this structure is passed to the next operator for further processing. Not all operators can be executed in BatchMode. Please refer to Industry leading analtyics query performance for details on BatchMode Operators.

Under the right circumstances, BatchMode execution can be a major performance benefit.

Comments closed