Press "Enter" to skip to content

Author: Kevin Feasel

Join Elimination

Bert Wagner shows off the concept of join elimination in SQL Server:

SQL Server avoids joining to the Sales.Invoices table because it trusts the referential integrity maintained by the foreign key constraint defined on InvoiceID between Sales.InvoiceLines and Sales.Invoices; if a row exists in Sales.InvoiceLines, a row with the matching value for InvoiceID must exist in Sales.Invoices. And since we are only returning data from the Sales.InvoiceLines table, SQL Server doesn’t need to read any pages from Sales.Invoices at all.

We can verify that SQL Server is using the foreign key constraint to eliminate the join by dropping the constraint and running our query again:

ALTER TABLE [Sales].[InvoiceLines] DROP CONSTRAINT [FK_Sales_InvoiceLines_InvoiceID_Sales_Invoices];

Erik Darling shows that the optimizer isn’t perfect at this:

Rob Farley has my favorite material on it. There’s an incredible amount of laziness ingenuity built into the optimizer to keep your servers from doing unnecessary work.

That’s why I’d expect a query like this to throw away the join:

After all, we’re joining the Users table to itself on the PK/CX. This doesn’t stand a chance at eliminating rows, producing duplicate rows, or producing NULL values. We’re only getting a count of the PK/CX, which isn’t NULLable anyway and…

So don’t do that.

Comments closed

Auto-Indentation In Power BI’s DAX Formula Bar

Chris Webb shows an easy-to-miss feature in Power BI:

The other day I discovered something new (at least to me) while writing the DAX for a measure in Power BI Desktop: when you insert a new line in your DAX expression using SHIFT-ENTER it also auto-indents the code. I asked a few people if this was new because I was sure I hadn’t seen it before, even though I always put line breaks in my code; of course Marco had and said he thought it had been around for a while. Anyway, Marco then commented that most people didn’t know you could even put line breaks in DAX and I thought to myself I should probably write a short blog post about all this, because of course line breaks and indentation make your code much more readable.

Click through for a demo as well as a couple of tips around this.

Comments closed

Multiple SYSDATETIME In The Same SELECT May Give Unexpected Results

Louis Davidson walks through a scenario he experienced:

The data is exactly as expected, even though the other two calls would have returned .902 and .903 if simply rounded off. On the other hand, looking for differences between the time1_3 and time2_3 columns:

Returns 133 rows. With the sysdatetime values being exactly the same:

But the other columns, are incorrect for our needs, as the values are the same:

This was the bug we experienced! Instead of being 1 millisecond different, the values were equal.

Louis’s moral to the story is to test your assumptions.  The more prosaic moral is that calls to get the current time take a non-zero amount of time.

Comments closed

Checking A Drive’s Allocation Unit Size

Ryan Adams shows how to find the allocation unit size for a disk volume:

To identify the allocation unit size for a volume, we can use the fsutil.exe utility.  In the output you are looking for “Bytes Per Cluster” which is your allocation unit size. Here is an example to retrieve the information for the G:\ volume.

1
fsutil fsinfo ntfsInfo G:

Ryan also shows how to change the allocation size, should you need to do so.

Comments closed

The Dangers Of The Ellipsis In R

John Mount shows us an example where ... (the ellipsis) can come back to hurt us:

The following code example contains an easy error in using the Rfunction unique().

vec1 <- c("a", "b", "c")
vec2 <- c("c", "d")
unique(vec1, vec2)
# [1] "a" "b" "c"

Notice none of the novel values from vec2 are present in the result. Our mistake was: we (improperly) tried to use unique() with multiple value arguments, as one would use union(). Also notice no error or warning was signaled. We used unique() incorrectly and nothing pointed this out to us. What compounded our error was R‘s “...” function signature feature.

John makes it clear that ... is not itself a bad thing, just that there is a time and a place for it and misusing it can lead to hard-to-understand bugs.

Comments closed

HDP 3.0 Released

Roni Fontaine and Saumitra Buragohain announce Hortonworks Data Platform version 3.0:

Other additional capabilities include:

  • Scalability and availability with NameNode federation, allowing customers to scale to thousands of nodes and a billion files. Higher availability with multiple name nodes and standby capabilities allow for the undisrupted, continuous cluster operations if a namenode goes down.

  • Lower total cost of ownership with erasure coding, providing a data protection method that up to this point has mostly been found in object stores. Hadoop 3 will no longer default to storing three full copies of each piece of data across its clusters. Instead of that 3x hit on storage, the erasure encoding method in Hadoop 3 will incur an overhead of 1.5x while maintaining the same level of data recoverability from disk failure. The end result will be a 50% savings in storage overhead, reducing it by half.

  • Real-time database, delivering improved query optimization to process more data at a faster rate by eliminating the performance gap between low-latency and high-throughput workloads. Enabled via Apache Hive 3.0, HDP 3.0 offers the only unified SQL solution that can seamlessly combine real-time & historical data, making both available for deep SQL analytics. New features such as                workload management enable fine grained resource allocation so no need to worry about resource competition. Materialized views pre-computes and caches the intermediate tables into views where the query optimizer will automatically leverage the pre-computed cache, drastically improve performance. The end result is faster time to insights.

  • Data science performance improvements around Apache Spark and Apache Hive integration. HDP 3.0 provides seamless Spark integration to the cloud. And containerized TensorFlow technical preview combined with GPU pooling delivers a deep learning framework that makes deep learning faster and easier.

Looks like it’s invite-only at the moment, but that should change pretty soon.  It also looks like I’ve got a new weekend project…

Comments closed

Visualizing Data In Real Time With SQL Server And Dash

Tomaz Kastrun shows how to use Python Dash to visualize data living in SQL Server in real time:

The need for visualizing the real-time data (or near-real time) has been and still is a very important daily driver for many businesses. Microsoft SQL Server has many capabilities to visualize streaming data and this time, I will tackle this issue using Python. And python Dash package  for building web applications and visualizations. Dash is build on top of the Flask, React and Plotly and give the wide range of capabilities to create a interactive web applications, interfaces and visualizations.

Tomaz’s example hit SQL Server every half-second to grab the latest changes and gives us an example of roll-your-own streaming.

Comments closed

Optimizing Conditionals In DAX

Marco Russo shows us a way to optimize mutually exclusive conditional calculations using DAX:

In previous articles, we discussed the importance of variables and how to optimize IF functions to reduce multiple evaluations of the same expression or measure. However, there are scenarios where the calculations executed in different branches of the same expression seem impossible to optimize. For example, consider the following pattern:

1
2
3
4
5
6
Amount :=
IF (
    <condition>,
    [Credit],
    [Debit]
)

In cases like this involving measures A and B, there does not seem to be any possible optimizations. However, by considering the nature of the two measures A and B, they might be different evaluations of the same base measure in different filter contexts.

Read on for a couple of examples.

Comments closed

Backing Up SQL Server To S3

David Fowler shows how to back up SQL Server directly to an AWS S3 bucket:

I’ve been having a little play around with AWS recently and was looking at S3 (AWS’ cloud storage) when I thought to myself, I wonder if it’s possible to backup up an on premise SQL Server database directly to S3?

When we want to backup directly to Azure, we can use the ‘TO URL’ clause in our backup statement.  Although S3 buckets can also be accessed via a URL, I couldn’t find a way to backup directly to that URL.  Most of the solutions on the web have you backing up your databases locally and then a second step of the job uses Power Shell to copy those backups up to your S3 buckets.  I didn’t really want to do it that way, I want to backup directly to S3 with no middle steps.  We like to keep things as simple as possible here at SQL Undercover, the more moving parts you’ve got, the more chance for things to go wrong.

So I needed a way for SQL Server to be able to directly access my buckets.  I started to wonder if it’s possible to map a bucket as a network drive.  A little hunting around and I came across this lovely tool, TNTDrive.  TNTDrive will let us do exactly that and with the bucket mapped as a local drive, it was simply a case of running the backup to that local drive.

Quite useful if your servers are in a disk crunch.  In general, I’d probably lean toward keeping on-disk backups and creating a job to migrate those backups to S3.

Comments closed

“Server Is Configured For Windows Authentication Only” Error

Kenneth Fisher diagnoses a misleading error:

In general, the errors SQL gives are highly useful. Of course every now and again you get one that’s just confounding. The other day I saw the following error in the log:

Login failed for user ”. Reason: An attempt to login using SQL authentication failed. Server is configured for Windows authentication only. [CLIENT: ]

This one confused me for a couple of reasons. First, the user ”. Why an empty user? That’s not really helpful. And second Server is configured for Windows authentication only.

But Kenneth shows that the server is configured for SQL authentication as well as Windows authentication.  Click through to see what gives.

Comments closed