Press "Enter" to skip to content

Author: Kevin Feasel

Using an Azure VM’s D Drive for tempdb

William Assaf shows how you can use the temporary D drive on an Azure VM to host tempdb in SQL Server:

Moving your SQL Server instance’s TempDB files to the D: volume is recommended for performance, as long as the TempDB files fit it the D: that has been allocated, based on your VM size. 
When the D: is lost due to deallocation, as expected, the subfolder you created for the TempDB files (if applicable) and the NTFS permissions granting SQL Server permission to the folder are no longer present. SQL Server will be unable to create the TempDB files in the subfolder and will not start. Even if you put the TempDB data and log files in the root of D:, after deallocation, that’s still not a solution, as the NTFS permissions to the root of D: won’t exist. In either case, SQL Server will be unable to create the TempDB files and will not start.

Read on for a few options and William’s thoughts on the relative merits of each.

Comments closed

MAX Versus LASTDATE in DAX

Alberto Ferrari explains why you might want to use MAX() instead of LASTDATE() to find the latest date in a table with DAX:

Many DAX newbies use LASTDATE to search for the last date in a time period. Or they use NEXTDAY to retrieve the day after a given date. Although these functions do what they promise, they are not intended to be used in simple expressions. Instead, they are table functions designed to be used in time intelligence calculations. Using them the wrong way leads to inefficient code. Moreover, using these functions in ways they were not designed for is a telltale sign that the developer still does not grasp certain details of DAX.

In this article, we elaborate on the topic in order to understand what these time intelligence functions do; we also want to understand the reason why it is so easy to confuse them with simple math over dates. We want to elaborate on this topic through examples. Therefore, instead of starting with boring theory, we start by looking at a calculation that – although it works perfectly fine – is inherently wrong.

Read on for that explanation.

Comments closed

Finding Indexes Not in Use

Dennes Torres takes us through a few iterations of a query to find indexes not in use:

It doesn’t matter if you are trying to remove indexes for good reasons or just to work around a bad environment, let’s see in more details how to find which indexes doesn’t have enough usage to justify their existance.

First, some basic definitions, without going into many details:

Index Seek: That’s the best and desirable use of the index. It means the index tree is being used to go directly to the records we need.

Index Scan: Not so good as an index seek, so it could be better. However, sometimes even an index scan is good, a non clustered index scan means the pages of that index are smaller an better for a scan than the pages of the clustered index. There are many variations that makes an index scan good, but most times you don’t need to reach this level of analysis, you may reach your objective only analysing index seeks.

Update: When the fields are updated (update/insert/delete) all indexes which contain those fields need to be updated as well. Indexes are a balance: We increase performance on reading and suffer a bit more when writting. The problem is when the writting happens more than the reading.

Read on to see Dennes’s query evolve and bring important information to the table. For example, it’s not just how often a particular index gets used; it’s also how important the queries are which use this index. An index may only run once a month, but if it turns the most important report the CEO cares about from running in 4 hours to running in 4 seconds, you bet that index is staying.

Comments closed

Choosing a Power BI Report Type

Paul Turley compares Power BI paginated and analytic reports:

This brings me to the subject of this post: Paginated and Analytic reports.

Before we had Power BI, we had Reporting Services. When the focus of my career and consulting practice became Business Intelligence – fifteen to about five years ago – most of the “BI reports” I created were in SQL Server Reporting Services (SSRS). We could create a variety of charts and reports that allowed users to drill-down from summary to details. We could create reports that allowed a user to drill-through, from one record or report item to another report that presented more details. With a little creativity, we could create reports that allowed users to click items that would filter other items on the same report. It took some work and time to create these interactive “drill-through-to-self” reports. Today, after transforming data into the right format for a tabular data model, this type of interactive functionality just magically happens in Power BI with very little report design effort. But, Power BI is primarily a tool for analyst users to create their own reports and to get answers to business questions rather than for IT developers to create reports that print or export perfectly to a PDF or Excel.

For now, Paul is asking for thoughts and questions with the promise that there will be an update to the post. So stop on by and ask a question or two.

Comments closed

Power BI (Lack of) Performance with Sharepoint

Matt Allington is not impressed:

On the face of it, it seems like a great idea to leverage SharePoint as a storage location for CSV and Excel files.

– Everyone has easy access to the files for editing and storage
– SharePoint manages version control, check in, check out etc
– SharePoint can facilitate shared editing of files
– You can build a Power BI report that will refresh online without the need to install a gateway.

Unfortunately, despite the benefits, the experience is not great.  Power BI performance with SharePoint as a data source is simply terrible.  Ultimately, the problems come down to performance in 2 areas.

Read on to learn more about these two issues and what you can do instead.

Comments closed

Eager Spools in SQL Server

Erik Darling describes the concept of eager spooling:

Not only does SQL Server create an index for you, it doesn’t really tell you about it. There’s no loud warning here.

It also throws that index away when the query is done executing. It only exists in tempdb while the query executes, and it’s only available to the query that builds it.

And boy, they sure can take a long time to build.

I enjoy and frequently use Erik’s depiction that an eager spool is SQL Server’s passive-aggressive way of telling you that you need an index.

Comments closed

Renaming Cached DataFrames in Spark

Landon Robinson works around an annoyance:

But DataFrames have not been given the same, clear route to convenient renaming of cached data. It has, however, been attempted and requested by the community:

https://forums.databricks.com/questions/6525/how-to-setname-on-a-dataframe.html
https://issues.apache.org/jira/browse/SPARK-8480

However, with the below approach, you can start naming your DataFrames all you want. It’s very handy.

Read on to see the solution in action.

Comments closed

A Kafka Alternative for Message Prioritization

Ricardo Ferreira explains why Apache Kafka doesn’t support message prioritization and gives a second-best solution when you do need it:

Users of messaging technologies such as JMS and AMQP often use message prioritization so that messages can be processed in a different order based on their importance. It doesn’t take much imagination to see where this can be useful—call center companies will want to handle the most severe cases first, then others; airline companies give preference to their service treats to customers with higher status, and telecommunication companies would probably give their most loyal customers a better renewal promotion.

One of the misconceptions that developers have when they first encounter Apache Kafka® is that it is just another messaging system, just like the ones they’re familiar with. By extension from that, they get disappointed when they find that Kafka doesn’t offer message prioritization.

Read on for the answers.

Comments closed

Thinking about Temporary Stats on Snapshots

Lonny Niederstadt shares an interesting scenario:

Consider a snapshot database which is created daily.  The purpose is to provide analytics reporting access while maintenance or data loads take place in the source database.  In the snapshot database, analytics reports have no locking concerns from the activity in the underlying source database.  And the temporary statistics provided by SQL Server, combined with the statistics inherited from the source database, provide a lot of information to the optimizer for query plan selection.
But what if significant fact tables are queried in the snapshot and leave a situation like col2 in stats_test?  A column which generates an auto-created stat in the snapshot, but never gets a statistic created in the source database.  Each day, the cost of creating that statistic and every statistic like it will be paid as part of the workload.  Even if the underlying table is a now-stable dimension. 

Click through for the demonstration.

Comments closed

The Importance of Specifying Nullability

Josh Darnell points out a case where not specifying nullability on columns can lead to confusion:

I always indicate whether a column allows NULL or not in CREATE TABLE and ALTER TABLE statements.

Or, you know, I’m in the habit of doing that.

Like, I try to be in the habit.

I’m doing my best, okay?

Anyway, it’s good to be clear about nullability, because you can run into surprises when you aren’t. 

Click through for a surprise.

Comments closed