Press "Enter" to skip to content

Author: Kevin Feasel

Separating Totals In DAX With IsInScope

Kasper de Jonge shows how we can use the IsInScope function in DAX to separate out parts of hierarchies like totals:

Finally I drag in the Values column and the Hierarchy in a matrix (I also turned on the +/- icons so we can expand collapse which is another November feature). The first thing we see is that the matrix looks weird with blanks and all.

The reason we are seeing this is this is how the data is set up, we have stored totals and grand totals at the lowest level in the hierarchy. Instead we want to show to them at the level where they are correct. Also we don’t want to show the aggregation created by the SUM.

Click through for the full example.

Comments closed

Automatically Enabling SQLCMD Mode In SSMS

Greg Low shows how to have every Management Studio tab open in SQLCMD mode:

Note the :CONNECT command is used to connect to another server.

Because everything else works pretty much the same, and you get a whole lot of additional options, you might choose to open all your new queries in SQLCMD mode. That’s easy to do.

SQLCMD mode is one of those things where I thought I’d use it a lot, but aside from deploying database projects, I don’t.  Granted, this could be a failure of imagination on my part.

Comments closed

Converting Binary To Hex With T-SQL

Dave Mason uses STRING_SPLIT to convert binary values to their hex equivalents:

I started pondering it for a bit and began to wonder if I could use the new for SQL Server 2016 STRING_SPLIT function to convert a binary string to decimal. The thought process was to split the string into rows of CHAR(1) values, along with an in-string character position. Then I could take the “1” values, calculate the 2ᵡ value, and sum them up for a decimal value.

I quickly realized I’d need to reverse the binary string. I also discovered the STRING_SPLIT function eliminated 0’s adjacent to 1’s. I don’t have much use for this code at the moment, but maybe there’s something here that you can take away.

Given some of the issues Dave ran into, it seems that a tally table-based solution might be a better choice.

Comments closed

Tempdb Blocking With Non-Clustered Columnstore Indexes

Ned Otter runs into a tricky issue:

I have a client that used Itzik Ben-Gan’s solution of creating a filtered nonclustered columnstore index to achieve batch mode on a rowstore (in fact I proposed that the client consider it). They have an OLTP system, and often perform YTD calculations. When they tested, processing time was reduced by 30 to 50 percent, without touching a single line of application code. If that ain’t low hanging fruit, I don’t know what is —

However, during testing, I noticed some intermittent blocking that didn’t make sense to me. But I couldn’t nail it down, and they went live with the “filtered nonclustered columnstore index” solution.

Once they deployed – and there was a lot of concurrency – I could see what had eluded me during my proof of concept: blocking in tempdb.

Read on for the repro and check out Ned’s UserVoice bug report.

Comments closed

Priority Queuing In Azure SQL Data Warehouse

Matt How walks us through an improvement to Azure SQL Data Warehouse:

The concept of workload management is a key factor for Azure SQL DW as there is only limited concurrency slots available and depending on the resource class, these slots can fill up pretty quickly. Once the concurrency slots are full, queries are queued until a sufficiently sized slot is opened up. Let’s recap what Resource Classes are and how they affect workload management.

A Resource Class is a pre-configured database role that determines how much resource is allocated to queries coming from users that belong to that role. For example, an ETL service account may use a “large” resource class and be allocated a generous amount of the server, however an analyst may use a “small” resource class and therefore only use up a small amount of the server with their queries. There are actually 2 types of resource class, Dynamic and Static. The Dynamic resource classes will grant a set percentage of memory to a query and actual value of this percentage will vary as the Warehouse scales up and down. The key factor is that an xLargeRc (extra-large resource class) will always take up 70% of the Server and will not allow any other queries to be run concurrently. No matter how much you scale up the Warehouse, queries run with an xLargeRc will run one at a time. Conversely, queries run with a smallrc will only be allocated 4% of the Server and therefore as a Warehouse scales up, this 4% becomes a larger amount of resource and can therefore process data quicker.

This looks like a useful addition.  Click through for a few examples of how it will work.

Comments closed

Tuning Apache Spark Applications

Vidisha Gupta has a few tips for tuning Apache Spark programs:

Data Serialization – Serialization plays an important role in increasing the performance of any application. Spark provides two serialization libraries –

  • Java Serialization: By default, spark uses Java’s ObjectOutputStream framework which can work with any class that implements java.io.serializable. This serialization is flexible but slow and creates large serialized formats for many classes.

  • Kryo Serialization: Spark can use Kryo library to serialize objects. It is much faster and compact but does not support all serializable types. So we must register those classes which we want to be serialized. Therefore, Kryo uses indices instead of full class names to identify data types which reduce the size of the serialized data thereby increasing performance. We can initialize our spark conf by setting the value of the property spark.serializer to org.apache.spark.serializer.KryoSerializer. This serializer has a major impact on performance when we are shuffling or caching a large amount of data. To know more about this serializer, refer  Kryo documentation

There are some good tips in here.

Comments closed

The Lesser-Known Apply Functions In R

Andrew Treadway covers a few of the lesser-known apply functions in R:

rapply

Let’s start with rapply. This function has a couple of different purposes. One is to recursively apply a function to a list. We’ll get to that in a moment. The other use of rapply is to a apply a function to only those elements in a list (or columns in a data frame) that belong to a specified class. For example, let’s say we have a data frame with a mix of categorical and numeric variables, but we want to evaluate a function only on the numeric variables.

Click through for some examples of rapply as well as vapply and eapply.  I’ve used rapply to get cardinality of each feature in a data frame but the other two are new to me.  H/T R-bloggers

Comments closed

New Diagnostics For Synchronous Statistics Updates

Joe Sack announces a new wait type and request command:

Consider the following query execution scenario:

  • You execute a SELECT query that triggers an automatic synchronous statistics update.
  • The synchronous statistics update begins execution and your query waits (is essentially blocked) until the fresh statistics are generated.
  • The query compilation and execution does not resume until the synchronous statistics update operation completes.

During this time, there are no external signs via common troubleshooting channels that the query is specifically waiting for the synchronous statistics update operation to complete.  If the statistics update takes a long time (due to a large table and\or busy system), there is no easy way to determine root cause of the high duration.

This is now available in SQL Server 2019 CTP 2.1.  Read the whole thing.

Comments closed

Deep Dive On Log Buffer Flushes

Itzik Ben-Gan delves into log buffer flushes and how SQL Server maintains durability without giving up too much performance:

The way SQL Server enforces transaction durability, in part, is by ensuring that all of the transaction’s changes are written to the database’s transaction log on disk before returning control to the caller. In a case of a power failure after a transaction’s commit was acknowledged, you know that all those changes were at least written to the on-disk transaction log. That’s the case even if the related data pages were modified only in the data cache (the buffer pool) but not yet flushed to the data files on disk. When you restart SQL Server, during the redo phase of the recovery process, SQL Server uses the information recorded in the log to replay changes that were applied after the last checkpoint and that haven’t made it to the data files. There’s a bit more to the story depending on the recovery model that you’re using and on whether bulk operations were applied after the last checkpoint, but for the purposes of our discussion, suffice to focus on the part that involves hardening the changes to the transaction log.

The tricky part in SQL Server’s logging architecture is that log writes are sequential. Had SQL Server not used some sort of a log buffer to alleviate log writes to disk, write-intensive systems—especially ones that involve lots of small transactions—would quickly run into terrible log-write-related performance bottlenecks.

To alleviate the negative performance impact of frequent sequential log writes to disk, SQL Server uses a log buffer in memory.

Itzik also covers a few technologies which can help if you’re experiencing log buffer flush-related slowness, including enabling delayed durability and purchasing nonvolatile memory (NVDIMM-N) for storage.

Comments closed