Press "Enter" to skip to content

Curated SQL Posts

Deleting in Azure Data Factory

Meagan Longoria is happy that Azure Data Factory v2 now has a Delete activity:

It is a common practice to load data to blob storage or data lake storage before loading to a database, especially if your data is coming from outside of Azure. We often create a staging area in our data lakes to hold data until it has been loaded to its next destination. Then we delete the data in the staging area once our subsequent load is successful. But before February 2019, there was no Delete activity. We had to write an Azure Function or use a Logic App called by a Web Activity in order to delete a file. I imagine every person who started working with Data Factory had to go and look this up.

But now Data Factory V2 has a Delete activity.

Meagan shows how it works, what kinds of parameters you can set, and a couple of gotchas, so check it out.

Comments closed

PolyBase and Pushdown Limitations

I have a post covering something I learned about predicate pushdown against Hadoop in PolyBase:

Before I start, let’s talk about predicate pushdown for a moment. The gist of it is that when you have data in two sources, you have two options for combining the data:

1. Bring the data in its entirety from your remote source to your local target and work as though everything were in the local target to begin with. I’ll call this the streaming approach.

2. Send as much of your query’s filters, projections, and pre-conditions to the remote source, have the remote source perform some of the work, and then have the remote source send its post-operative data to the local target. Then, the local target once more treats this as though it were simply local data. This is the pushdown approach because you push down those predicates (that is, filters, projections, and pre-conditions).

Click through for the unfortunate finding and also vote up my UserVoice feature request if you want to see string columns as filters.

Comments closed

Running Totals in DAX

Alberto Ferrari shows how you can calculate running totals in DAX:

A very common calculation in DAX is the year-to-date calculation (YTD), which aggregates values from the beginning of the year all the way to a certain date. A simple implementation uses the predefined DATESYTD function:

Sales YTD :=
CALCULATE (
[Sales Amount],
DATESYTD( 'Date'[Date] )
)

But click through to see when this function stops being useful and what you should replace it with when it does.

Comments closed

CTP 2.3 and Multi-Scoped Temp Tables

Joe Sack announces a performance optimization when you generate temp tables and then work with them in a sub-procedure:

SQL Server 2019 introduces several performance optimizations which will improve performance with minimal changes required to your application code.   In this blog post we’ll discuss one such improvement available in CTP 2.3: reduced recompilations for workloads using temporary tables in multiple scopes.

In order to understand this improvement, we’ll first go over the current behavior in SQL Server 2017 and prior. When referencing a temporary table with a DML statement (SELECT, INSERT, UPDATE, DELETE), if the temporary table was created by an outer scope batch, we will recompile the DML statement each time it is executed.

There’s a pretty big performance improvement here, but architecturally, I really don’t like it. This makes the inner procedure unrunnable unless you know that there should be some temp table(s) and what should be in those temp table(s).

Comments closed

Performance Tuning DAX

Matthew Mowrey has some tips for tuning DAX code:

Optimization has become one of my favorite topics. OK, I’ll admit it: it might be an obsession and a source of joy. My inability to optimize (queries, data models, and DAX) used to be a source of headaches and the occasional heart attack depending on what I was trying to accomplish. I make no claim that I’m an optimization expert, but I’ve spent hours wondering and researching why a data model’s calculation time could go from a couple of seconds to many minutes. This is a rare occurrence, but when it has happened, it’s happened when I’ve been on the verge of something great. Of course, if you’re taking business intelligence to its edge, you’re not adding 2+2—you’re trying to give someone an answer they (and you) thought was impossible. A recalculation that takes many minutes (especially if you’re not the end-user) is unacceptable and may cause your data model to crash—fatal exceptions anyone?

This article focuses on an optimization technique that I couldn’t find anywhere. Before I get into that technique, I’d like to touch on what I consider to be the basic tenets of optimization when working with data modeling and DAX.

Read on for the technique.

Comments closed

When Deletes Increase Data Size

Brent Ozar shows a case where indexes can grow in size as you delete data:

Here’s one way that deletes can cause a table to grow:
– The rows were originally written when the database didn’t have Read Committed Snapshot (RCSI), Snapshot Isolation (SI), or Availability Groups (AGs) enabled
– RCSI or SI was enabled, or the database was added into an Availability Group
– During the deletions, a 14-byte timestamp was added to each deleted row to support RCSI/SI/AG reads

Click through for a demo and takeaways.

Comments closed

Querying CosmosDB with PolyBase

Hasan Savran shows how you can query CosmosDB with T-SQL statements:

SQL Server Polybase Services lets us to pull data from other data resources by using T-SQL queries. SQL Server 2019 introduces new connectors for Polybase services like Oracle, Teradata and MongoDB. In one of the SQL Server 2019 presentations from Bob Ward, I saw the CosmosDB logo when he was talking about the new connectors of SQL Server 2019. CosmosDB already has an ODBC driver and you can use it as a datasource for your Power BI, SSIS or SSMS. SQL Server 2019 makes this connection easier by using the Polybase services.
      
      In this post, I am going to show you how to configure SQL Server 2019 Polybase to connect Azure CosmosDB Mongo databases

I have a bit of a vested interest in this, so it is heartening to see people trying out PolyBase. It’s more heartening when they use it after they try it out.

Comments closed

Deep Thoughts on SLAs

Ben Slater has a great post on service level agreements:

It should set the customer’s expectations, be realistic and be crystal clear, with no scope for misinterpretation. Well, that’s how they are meant be. But unfortunately, in the quest for more sales, some MSPs tend to commit themselves to unrealistic SLAs. It’s tempting to buy into a service when an MSP offers you 100% availability. It is even more tempting when you see a compensation clause that gives you confidence in going ahead with that MSP. But hold on! Have you checked out the exclusion clauses? Have you checked out your responsibilities in order to get what you are entitled to in the SLA? Just as it is the MSP’s responsibility to define a crystal-clear SLA, it is the customer’s responsibility to thoroughly understand the SLA and be aware of every clause in it. That is how you will notice the shades of gray!

We have put together a list of things to look for in an SLA so that customers are aware of the nuances involved and avoid unpleasant surprises after signing on.

This is important to know when you’re using a managed service but it also works internally: as a developer or manager on a product that people (presumably) rely upon, how you deal with outages and issues matters a lot.

Comments closed

Making Stored Procedure Changes With Limited Downtime

I continue my series on database development in a (near) zero downtime environment:

Versioning a procedure is pretty simple: you create a new procedure with alterations you want. Corporate naming standards where I’m at have you add a number to the end of versioned procedures, so if you have dbo.SomeProcedure, the new version would be dbo.SomeProcedure01. Then, the next time you version, you’ll have dbo.SomeProcedure02 and so on. For frequently-changing procedures, you might get up to version 05 or 06, but in practice, you’re probably not making that many changes to a procedure’s signature. For example, looking at a directory with exactly 100 procedures in it, I see 7 with a number at the end. Two of those seven procedures are old versions of procedures I can’t drop quite yet, so that means that there are only five “unique” procedures that we’ve versioned in a code base which is two years old. Looking at a different part of the code with 879 stored procedures, 95 have been versioned at least once in the 15 or so years of that code base’s existence. The real number is a bit higher than that because we’ve renamed procedures over time and renamings tend to start the process over as we might go from dbo.SomeProcedure04 to dbo.SomeNewProcedure when we redesign underlying tables or make other drastic architectural changes.

The secret is, I’m always versioning.

Comments closed

Power BI AutoML

Teo Lachev takes a look at AutoML in Power BI:

Let’s see how AutoML works based on what’s in the private preview (the usual disclaimer is that things will probably change). To start with, AutoML requires a dataflow (a note to Microsoft here is that AutoML will become more pervasive if it’s available in Power BI Desktop and it doesn’t require a premium capacity). In the private preview, AutoML requires the following steps. Presumably. the first (and most difficult step), preparing the dataset and cleansing the data is already done and available as a dataflow entity:

It looks like Microsoft’s taking what they learned from Azure ML and trying to port it over to Power BI.

Comments closed