Press "Enter" to skip to content

Curated SQL Posts

ADF.PROCFWK 1.1 Released

Paul Andrew has a new name for the metadata-driven framework in Azure Data Factory:

Hi data friends! Version 1.1 of my ADF.procfwk is ready!

Following the great response I got to the version 1.0 blog series I decided to push ahead and get version 1.1 out there as soon as possible. The main thing I’ve addressed with this release is the big problem of having hard coded service principal credentials in the body of the Azure Function request that called our execution pipelines. Hopefully we can all agree this was/is not great for a number of reasons.

Read on for more details including a detailed changelog, and check out the GitHub repo.

Comments closed

Visualizing Power BI Query Parallelism

Chris Webb gives us a way to visualize Power BI Premium and Azure Analysis Services query parallelism:

Why is this important? When you’re tuning the performance of a Power BI report the first thing to do is to look at the performance of the individual DAX queries run and make them run as fast as possible. However when a Power BI report is rendered any one query is likely to be run at the same time as several other queries run for the same report, and this will have an impact on its performance. How much of an impact there is will depend on how many queries need to be run and the number of back-end v-cores available on your Premium capacity, or the number of QPUs available on your Azure Analysis Services instance if you’re using a Live connection to AAS. The more v-cores/QPUs you have available, the more of the work needed for a query that can be run in parallel; you can see a table listing the number of v-cores for each Premium SKU here, and the number of QPUs for each Azure Analysis Services SKU here. As a result of this if you have reports with a large number of visuals that generate slow DAX queries, scaling up your Power BI Premium capacity or AAS instance may improve overall report performance. Reducing the number of visuals on your report and/or reducing the number of visuals needed to display the same information will also reduce the number of queries that need to be run and therefore improve overall performance.

Read on for the demo.

Comments closed

SQL Server 2019 Supports Ubuntu 18.04, SLES 12 SP5

Tejas Shah announces a couple new versions of Linux distributions SQL Server 2019 supports:

We are also glad to announce the availability of SQL Server 2019 CU3 container image build on Ubuntu 18.04. It is fully supported for production use. You can read about how to deploy the container at following page.

Running SQL Server 2019 CU3 container with Ubuntu 18.04 base: https://docs.microsoft.com/en-us/sql/linux/quickstart-install-connect-docker?view=sql-server-ver15&p…

You could run SQL Server on these versions of Ubuntu and SUSE, but now you get production support, making it prudent to run on these. Ubuntu 20.4 is coming out soon, so we’ll see when that gets supported.

Comments closed

Tuning EMR Performance with Dr. Elephant and Sparklens

Nivas Shankar and Mert Hocanin show us how to use a couple of products to tune Hive and Spark jobs:

Data engineers and ETL developers often spend a significant amount of time running and tuning Apache Spark jobs with different parameters to evaluate performance, which can be challenging and time-consuming. Dr. Elephant and Sparklens help you tune your Spark and Hive applications by monitoring your workloads and providing suggested changes to optimize performance parameters, like required Executor nodes, Core nodes, Driver Memory and Hive (Tez or MapReduce) jobs on Mapper, Reducer, Memory, Data Skew configurations. Dr. Elephant gathers job metrics, runs analysis on them, and presents optimization recommendations in a simple way for easy consumption and corrective actions. Similarly, Sparklens makes it easy to understand the scalability limits of Spark applications and compute resources, and runs efficiently with well-defined methods instead of leaning by trial and error, which saves both developer and compute time.

This post demonstrates how to install Dr. Elephant and Sparklens on an Amazon EMR cluster and run workloads to demonstrate these tools’ capabilities. Amazon EMR is a managed Hadoop service offered by AWS to easily and cost-effectively run Hadoop and other open-source frameworks on AWS.

Even if you aren’t using ElasticMapReduce, Dr. Elephant and Sparklens are quite useful products.

Comments closed

Collation Conflicts with Extended Events

Jason Brimhall takes us through a nasty scenario:

Have you ever run into an error like this?

Cannot resolve the collation conflict between “pick a collation” and “pick another collation” in the equal to operation.

This kind of error seems pretty straight forward and it is safe to say that it generally happens in a query. When you know what the query is and you get this error, it is pretty easy to spot and fix the problem. At least you can band-aid it well enough to get past the error with a little use of the collate clause in your query.

But what if the error you are seeing is popping up when you are trying to use Management Studio (SSMS)? The error is less than helpful and can look a little something like this.

In this case, it was when trying to expand Extended Events sessions. Click through to see what’s going on and how to fix the problem.

Comments closed

Network Security Changes Around Azure SQL DB

Rohit Nayak announces some changes to Azure SQL Database’s connectivity and network security:

Now in general availability, Private Link enables users to have private connectivity from a Microsoft Azure Virtual Network to Azure SQL Database.

This feature creates a private endpoint which maps a private IP Address from the Virtual Network to your Azure SQL Database.

From security perspective, Private Link provides you with data exfiltration protection on the login path to SQL Database. Additionally, it does not require adding of any IP addresses to the firewall on Azure SQL Database or changing the connection string of your application.

Private Link is built on best of class Software Defined Networking (SDN) functionality from the Azure Networking team. Clients can connect to the Private endpoint from within the same Virtual Network, peered Virtual Networking the same region, or via VNet-to-VNet connection across regions. Additionally, clients can connect from on-premises using ExpressRoute, private peering, or VPN tunneling. More information can be found here

Click through to see what else they’ve been working on.

Comments closed

Finding Active Users by Date with DAX

Reza Rad takes us through the subscription pattern:

One of the common patterns of a data model is the subscription pattern. In this pattern, we have subscriptions (or tickets, or issues, or whatever else you want to call it) open date and close date. The way that this data is stored in a table makes it a bit challenging to get informative insight out of it. In this post (first of the series), I am going to explain about the subscription pattern and one of the common calculations needed for it; which is active subscribers at any given date, or open tickets at any given date.

In case you’re curious, here’s a solution which works in T-SQL. I’ve really taken to event-style tables, where there’s one row per state change, so instead of having a begin date and an end date for each action, have a row which contains the date and the type of action. This makes operating on the data a lot easier, though it does make rules preventing common entry problems (end date before start date, etc.) a bit trickier.

Comments closed

Restoring Databases Inside a Container

Aaron Bertrand has a quick-and-easy method of restoring databases in a Docker container:

I have been using Docker containers for local development and testing for some time now; I first blogged about my steps into this brave new world back in late 2016. Most of the time, I just need to create some throwaway database, with a throwaway table, to prove a point or to validate an answer I’ve supplied.

Sometimes, though, I need to work with a real database. It’s a little trickier to do this in a container, because it’s isolated — I can’t just attach or restore from my Downloads folder. I could fire up a VM and attach there, but I actually don’t even use Parallels on my work laptop, and I find that using VMs leads to response times that are a lot more sluggish across the board.

It’s not too difficult to get files into your containers, provided your instance will have a total size less than the size of the container, and this quick tutorial proved to be very helpful.

Read on for the walkthrough. Aaron notes that this is a Mac-specific walkthrough, though the Windows and Linux versions are pretty similar as well, as we still create persistent volumes.

Comments closed

Using Python to Pivot Data in SQL Server

Rajendra Gupta shows a few ways to pivot data using Python:

We can use groupby and lambda functions as well in the Python scripts for Pivot tables. For this example, I have a data set of a few states of India and their cities in a SQL table.

We need a pivot table from this data. In the output, it should list all cities for a state in a column; it should use || as a city name separator.

This is an unorthodox but interesting use of Machine Learning Services.

Comments closed