Press "Enter" to skip to content

Author: Kevin Feasel

Azure Data Studio August Release

Alan Yu announces a new version of Azure Data Studio:

The key highlights to cover this month include:
– SandDance integration—A new way to interact with data
– Notebook improvements
– SQL Server Dacpac extension can support Azure Active Directory
– SQL Server 2019 extension
– Visual Studio Code merge 1.37
– Bug fixes

Being able to add a new cell inline is nice, especially when you’re dealing with larger notebooks.

Comments closed

Isolation Levels and Dynamic SQL

Max Vernon points out how transaction isolation levels work when combined with sp_executesql:

Imagine you have a piece of code where you don’t care about the downsides to the “read uncommitted” isolation level, and do your due diligence by adding SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; at the start of your code. The code following that statement will run under that isolation level, as expected. However, if you call dynamic T-SQL with sys.sp_executesql, and set the isolation level there, the dynamic code will run under the READ UNCOMMITTED, however the isolation level will not be changed for the calling code. In other words, be careful about where you set the isolation level.

Click through for a demonstration of this.

Comments closed

Auditing Database Backups

Jovan Popovic shows how you can audit who is taking backups on an Azure SQL Managed Instance:

One mechanism to ensure that nobody can take the COPY_ONLY backup of your database is to use Transparent Data Encryption that automatically encrypts all backups. In that case you would need to use Customer-managed (BYOK) TDE where you will keep your encryption key in Azure Key-Vault. User-initiated COPY_ONLY backups are (currently) not allowed if you are using Service-managed TDE.

If you don’t use TDE on the database or there is a risk that someone can remove TDE from database and then take a backup, Managed Instance provides auditing mechanism that enables you to track who performed a backup and when. This way you can always track if some unauthorized action is happening.

Read on for the solution. Looking through it, it seems like conceptually it’d work equally well with on-prem/IaaS SQL Server as with Managed Instances.

Comments closed

In the Papers: Plan Stitch

Brent Ozar reviews a Microsoft Research paper:

In the Microsoft Research paper Plan Stitch: Harnessing the Best of Many Plans by Bailu Ding, Sudipto Das, Wentao Wu, Surajit Chaudhuri, and Vivek Narasayya (2018), the authors propose something really cool: watching an execution plan change over time, splitting it into parts, and then combining parts to form a SuperPlan™. (They don’t call it a SuperPlan™ – it just seems appropriate, right?)

That looks like an interesting paper, and Brent has a few more if you agree.

Comments closed

When xp_logininfo Fails

Gianluca Sartori helps Future Gianluca (and present us in the meantime) troubleshoot issues with xp_logininfo:

The user does not exist
This is very easy to check: does the user exist in Windows? Did you misspell the name?

You can check this from a cmd window, issuing this command:

net user SomeUser /domain

If you spelled the user correctly, the command will return information about it, like description, password settings, group membership and so on.

There are a few other potential causes, so click through for those.

Comments closed

When Adding Indexes Hurts Performance

Jeffry Schwartz takes us through an odd case:

Recently, a customer requested that we tune a query that took 13 seconds to return 11 rows.  SQL Server 2017 suggested an index to improve performance, so we added it in a development environment.  The improvement made the query run 647 seconds, almost 50 TIMES longer than the original!  This naturally caused much consternation, so we decided to determine what and why it happened as well as how we could still achieve the original objective, i.e., make the query run faster.  This article discusses what caused the original performance problem in addition to the new one that was caused by the introduction of an index, and illustrates how we were able to make the query run significantly faster than it did originally.  We will cover reading query plans, examining the specific details of query plan operators, the effects of index statistics on missing index recommendations, using query plan XML to enable simpler query plan comparison, and the effects of using functions in where clauses. 

Click through to understand how this could be the case.

Comments closed

Machine Learning and Delta Lake

Brenner Heintz and Denny Lee walk us through solving data engineering problems with Delta Lake:

As a result, companies tend to have a lot of raw, unstructured data that they’ve collected from various sources sitting stagnant in data lakes. Without a way to reliably combine historical data with real-time streaming data, and add structure to the data so that it can be fed into machine learning models, these data lakes can quickly become convoluted, unorganized messes that have given rise to the term “data swamps.”

Before a single data point has been transformed or analyzed, data engineers have already run into their first dilemma: how to bring together processing of historical (“batch”) data, and real-time streaming data. Traditionally, one might use a lambda architecture to bridge this gap, but that presents problems of its own stemming from lambda’s complexity, as well as its tendency to cause data loss or corruption.

Read the whole thing.

Comments closed

Cloudera Stream Processing

Dinesh Chandrasekhar announces the new iteration of Cloudera’s streaming data processor:

Cloudera Stream Processing (CSP) is a product within the Cloudera DataFlow platform that packs Kafka along with some key streaming components that empower enterprises to handle some of the most complex and sophisticated streaming use cases. CSP provides advanced messaging, real-time processing and analytics on real-time streaming data using Apache Kafka. CSP also supports key management and monitoring capabilities powered by Cloudera Streams Management (CSM).

Sounds like they’re taking the Kafka route over Spark Streaming, Flink, Airflow, etc.

Comments closed

Reading SQL Server Error Logs

Thomas Rushton has a script for us:

Why Script This? What’s Wrong With SSMS’s GUI?
Well, although SSMS does allow you to look at the error logs, it’s not very helpful for filtering – you can only filter for items that match, rather than exclude items. There are a few other filters as well – I guess the whole thing is just a wrapper around xp_readerrorlog below…

But Thomas has a better way for us.

Comments closed

Rounding to Intervals in Various Languages

Dave Mason doesn’t like rounding to intervals very much in T-SQL:

If I had to choose any of these options for production, I’d probably go with Query 1, just because I think it would make the most sense to any other developer that might encounter it (including future me, who probably wouldn’t remember writing the code). But I’m not really thrilled with any of the options. The one thing T-SQL has going for it though, is the relative ease for truncating the TIME off of a DATETIME by casting to DATE and back to DATETIME. I couldn’t find anything like this in DAX.

Dave also shows how to do this in DAX and Powershell.

Another alternative that Dave doesn’t mention is to invert the problem: if you have a fixed set of intervals you care about (e.g., 15-minute, 30-minute, hour, 4-hour, etc.), you can create a time table. This is like a date table but contains times of the day where you’ve precalculated the intervals. Then you join to the time table and have your results right there. If you do go this route, I’d try to keep the grain of the time table as shallow as possible, maybe using DATETIME2(0) instead of DATETIME2(7).

Comments closed