Press "Enter" to skip to content

Curated SQL Posts

PolyBase Bug Around Windows Authentication

I have a post documenting a bug in SQL Server 2019:

Here’s the short version of the bug. If you are connected using a Windows authenticated account and attempt to perform a PolyBase-related action, such as creating an external data source or querying from an external table, you receive the following error:

Msg 46721, Level 20, State 1, Line 5
Login failed. The login is from an untrusted domain and cannot be used with Integrated authentication.

Because this is an error with a severity level of 20, it kills your session.

Click through for the workaround. I had hoped that this would have been fixed with CU3, but it’s still in there.

Comments closed

Quickly Finding Distinct Values with T-SQL

Paul White does some amazing things with T-SQL, news at 11:

I will be looking for distinct values in the BountyAmount column of the dbo.Votes table, presented in bounty amount order ascending. The Votes table has just under 53 million rows (52,928,720 to be exact). There are just 19 different bounty amounts, including NULL.

The Stack Overflow 2013 database comes without nonclustered indexes to minimize download time. There is a clustered primary key index on the Id column of the dbo.Votes table. It comes set to SQL Server 2008 compatibility (level 100), but we will start with a more modern setting of SQL Server 2017 (level 140):

Getting the query down from 10.5 seconds to 1ms is crazy.

Comments closed

ADF.PROCFWK 1.1 Released

Paul Andrew has a new name for the metadata-driven framework in Azure Data Factory:

Hi data friends! Version 1.1 of my ADF.procfwk is ready!

Following the great response I got to the version 1.0 blog series I decided to push ahead and get version 1.1 out there as soon as possible. The main thing I’ve addressed with this release is the big problem of having hard coded service principal credentials in the body of the Azure Function request that called our execution pipelines. Hopefully we can all agree this was/is not great for a number of reasons.

Read on for more details including a detailed changelog, and check out the GitHub repo.

Comments closed

Visualizing Power BI Query Parallelism

Chris Webb gives us a way to visualize Power BI Premium and Azure Analysis Services query parallelism:

Why is this important? When you’re tuning the performance of a Power BI report the first thing to do is to look at the performance of the individual DAX queries run and make them run as fast as possible. However when a Power BI report is rendered any one query is likely to be run at the same time as several other queries run for the same report, and this will have an impact on its performance. How much of an impact there is will depend on how many queries need to be run and the number of back-end v-cores available on your Premium capacity, or the number of QPUs available on your Azure Analysis Services instance if you’re using a Live connection to AAS. The more v-cores/QPUs you have available, the more of the work needed for a query that can be run in parallel; you can see a table listing the number of v-cores for each Premium SKU here, and the number of QPUs for each Azure Analysis Services SKU here. As a result of this if you have reports with a large number of visuals that generate slow DAX queries, scaling up your Power BI Premium capacity or AAS instance may improve overall report performance. Reducing the number of visuals on your report and/or reducing the number of visuals needed to display the same information will also reduce the number of queries that need to be run and therefore improve overall performance.

Read on for the demo.

Comments closed

SQL Server 2019 Supports Ubuntu 18.04, SLES 12 SP5

Tejas Shah announces a couple new versions of Linux distributions SQL Server 2019 supports:

We are also glad to announce the availability of SQL Server 2019 CU3 container image build on Ubuntu 18.04. It is fully supported for production use. You can read about how to deploy the container at following page.

Running SQL Server 2019 CU3 container with Ubuntu 18.04 base: https://docs.microsoft.com/en-us/sql/linux/quickstart-install-connect-docker?view=sql-server-ver15&p…

You could run SQL Server on these versions of Ubuntu and SUSE, but now you get production support, making it prudent to run on these. Ubuntu 20.4 is coming out soon, so we’ll see when that gets supported.

Comments closed

Tuning EMR Performance with Dr. Elephant and Sparklens

Nivas Shankar and Mert Hocanin show us how to use a couple of products to tune Hive and Spark jobs:

Data engineers and ETL developers often spend a significant amount of time running and tuning Apache Spark jobs with different parameters to evaluate performance, which can be challenging and time-consuming. Dr. Elephant and Sparklens help you tune your Spark and Hive applications by monitoring your workloads and providing suggested changes to optimize performance parameters, like required Executor nodes, Core nodes, Driver Memory and Hive (Tez or MapReduce) jobs on Mapper, Reducer, Memory, Data Skew configurations. Dr. Elephant gathers job metrics, runs analysis on them, and presents optimization recommendations in a simple way for easy consumption and corrective actions. Similarly, Sparklens makes it easy to understand the scalability limits of Spark applications and compute resources, and runs efficiently with well-defined methods instead of leaning by trial and error, which saves both developer and compute time.

This post demonstrates how to install Dr. Elephant and Sparklens on an Amazon EMR cluster and run workloads to demonstrate these tools’ capabilities. Amazon EMR is a managed Hadoop service offered by AWS to easily and cost-effectively run Hadoop and other open-source frameworks on AWS.

Even if you aren’t using ElasticMapReduce, Dr. Elephant and Sparklens are quite useful products.

Comments closed

Collation Conflicts with Extended Events

Jason Brimhall takes us through a nasty scenario:

Have you ever run into an error like this?

Cannot resolve the collation conflict between “pick a collation” and “pick another collation” in the equal to operation.

This kind of error seems pretty straight forward and it is safe to say that it generally happens in a query. When you know what the query is and you get this error, it is pretty easy to spot and fix the problem. At least you can band-aid it well enough to get past the error with a little use of the collate clause in your query.

But what if the error you are seeing is popping up when you are trying to use Management Studio (SSMS)? The error is less than helpful and can look a little something like this.

In this case, it was when trying to expand Extended Events sessions. Click through to see what’s going on and how to fix the problem.

Comments closed

Network Security Changes Around Azure SQL DB

Rohit Nayak announces some changes to Azure SQL Database’s connectivity and network security:

Now in general availability, Private Link enables users to have private connectivity from a Microsoft Azure Virtual Network to Azure SQL Database.

This feature creates a private endpoint which maps a private IP Address from the Virtual Network to your Azure SQL Database.

From security perspective, Private Link provides you with data exfiltration protection on the login path to SQL Database. Additionally, it does not require adding of any IP addresses to the firewall on Azure SQL Database or changing the connection string of your application.

Private Link is built on best of class Software Defined Networking (SDN) functionality from the Azure Networking team. Clients can connect to the Private endpoint from within the same Virtual Network, peered Virtual Networking the same region, or via VNet-to-VNet connection across regions. Additionally, clients can connect from on-premises using ExpressRoute, private peering, or VPN tunneling. More information can be found here

Click through to see what else they’ve been working on.

Comments closed

Finding Active Users by Date with DAX

Reza Rad takes us through the subscription pattern:

One of the common patterns of a data model is the subscription pattern. In this pattern, we have subscriptions (or tickets, or issues, or whatever else you want to call it) open date and close date. The way that this data is stored in a table makes it a bit challenging to get informative insight out of it. In this post (first of the series), I am going to explain about the subscription pattern and one of the common calculations needed for it; which is active subscribers at any given date, or open tickets at any given date.

In case you’re curious, here’s a solution which works in T-SQL. I’ve really taken to event-style tables, where there’s one row per state change, so instead of having a begin date and an end date for each action, have a row which contains the date and the type of action. This makes operating on the data a lot easier, though it does make rules preventing common entry problems (end date before start date, etc.) a bit trickier.

Comments closed