Press "Enter" to skip to content

Curated SQL Posts

Speeding Up Power BI Aggregations With Primary Keys

Chris Webb has an interesting use case for adding primary keys on lookup tables:

As you can see, the Property Type column from the #”Price Paid” query contains single letter codes describing the type of property sold in each transaction; the Property Type column from #“Property Types” contains a distinct list of the same codes and acts as a dimension table. Again there’s nothing interesting going on in this query.

The problems start when you try to join data from these two queries using a Merge and then, for each row in #”Property Types”, show the sum of the Price Paid column from #”Price Paid”.

Although baseline performance is bad, Chris shows a way of improving that performance significantly.

Comments closed

XML In Scala

Mahesh Chand Kandpal shows how to create XPath statements in Scala:

We called the \() on the XML element and asked it to look for all symbol elements.  It returns an instance of scala.xml.NodeSeq, which represents a collection of XML nodes.

The \() method looks only for the elements that are direct descendants of the target element(i.e symbol).   If we want to search through all the elements in the hierarchy starting from the target element, \\() method is used

Check it out, especially if you’re working with Spark, as you never know when a rogue XML file will head your way.

Comments closed

Remote DAC And Vulnerability Assessments

Max Vernon points out a SQL Server Management Studio Vulnerability Assessment check which seems somewhat incomplete:

Certainly, you’d want to ensure the port for the DAC is not available to the Internet, but hopefully if you’re reading this blog you already know how silly it would be to open SQL Server to the Internet.

Assuming you don’t have the port open to the Internet, it’s very likely the DAC will not be of any use at all if you disable Remote Admin Connections as advised in the Vulnerability Assessment. My advice is to ignore this warning completely and configure the DAC to allow remote connections. Microsoft Technet has documentation about using the DAC, and says to configure it for remote connections by logging onto the server locally first, then configuring SQL Server to allow remote DAC connections, which seems a bit like putting the cart before the horse.

Read the whole thing.  I agree with Max’s assessment that if there are some basic controls around your instance (like not letting SQL Server be Internet-accessible, putting SQL Server instances in a protected subnet, etc.), remote DAC is definitely useful enough to keep running.

Comments closed

Query Store Deadlock When Creating Databases

Andy Mallon ran into a weird issue with Query Store:

I tried the configuration a couple of times just to make sure it wasn’t a one-off problem. I installed the latest Cumulative Update (CU). I made sure nothing else was connected to the instance. I rebooted my machine. I restarted services. I banged my head against the wall. I asked a friend if I was insane or stupid. After confirming that I was both, my friend Aaron Bertrand (blog|twitter) confirmed it wasn’t a problem for him.

I discovered I could reproduce the problem simply by running the same simple statement that SSRS used when creating the ReportServer database. SSRS uses a non-standard collation, and specifying that collation seems to be the difference in causing the deadlock. Then I discovered that specifying ANY non-standard collation was causing the deadlock. This had nothing to do with SSRS, and everything to do with non-default collations.

Vote for his User Voice item too.

Comments closed

Adding Instance Name To The AlwaysON_health Session

Jonathan Kehayias shows how to add server_instance_name to the AlwaysON_health event session to make Availability Group troubleshooting easier:

The AlwaysOn_health event session in Extended Events is intended to make analyzing problems with Availability Groups possible after they have occurred.  While this event session goes a long way towards making it possible to piece together the puzzle of what went wrong in a lot of situations, it can still be a difficult task.  One of the things I wish Microsoft had included in the AlwaysON_health event session definition is the sqlserver.server_instance_name action for every event, and this is something that I usually recommend clients add to each of their AG servers using a script after I work with them the first time.  Why would this be useful, since if the files come from a specific server we should know the events are for that server right? Well, when we are working with AG configurations with more than two servers and trying to see the big picture of what is happening across the servers, it can be difficult to follow timelines with multiple files from multiple servers open.  It’s not impossible, but it does make things more difficult.

Click through to see how to do this through the UI or via T-SQL.

Comments closed

Performance Concern: Anti-Join And Top

Paul White explains a scenario in which an innocent-looking execution plan can hide something sinister:

Not every execution plan containing an apply anti join with a Top (1) operator on its inner side will be problematic. Nevertheless, it is a pattern to recognise and one which almost always requires further investigation.

The four main elements to look out for are:

  • A correlated nested loops (apply) anti join

  • Top (1) operator immediately on the inner side

  • A significant number of rows on the outer input (so the inner side will be run many times)

  • potentially expensive subtree below the Top

Read the whole thing.  This is a great way to wrap up the series.

Comments closed

The Microsoft Team Data Science Process Lifecycle Versus CRISP-DM

Melody Zacharias compares Microsoft’s Team Data Science Process lifecycle with the CRISP-DM process:

As I pointed out in my previous blog, the TDSP lifecycle is made up of five iterative stages:

  1. Business Understanding
  2. Data Acquisition and Understanding
  3. Modeling
  4. Deployment
  5. Customer Acceptance

This is not very different from the six major phases used by the Cross Industry Standard Process for Data Mining (“CRISP-DM”).

This is part of a series on data science that Melody is putting together, so check it out.

Comments closed

Corrupting Managed Instances

Brent Ozar has found a bug with Azure SQL Database Managed Instances:

Corruption happens. It’s just a fact of life – storage is gonna fail. Microsoft’s SLAs for storage only give you 3-4 9’s, and there’s nothing in there about never losing your data. Nothing against Azure, either – I’ve lost entire VMs in AWS due to storage corruption.

So let’s demo it. Normally, this kind of thing might be hard to do, but at the moment, DBCC WRITEPAGE is enabled (although I expect that to change before MIs hit General Availability.) I used Erik’s notorious sp_GoAheadAndFireMe to purposely corrupt the master database (not TempDB. I modified it to work with a user database instead, ran it, and in less than ten seconds, the entire instance went unresponsive.

It’s a good post, so check it out.

Comments closed

Collecting PRINT Outputs From Powershell

Jana Sattainathan shows how to query a number of SQL Server instances in parallel using Powershell and collecting the PRINT outputs from each:

As an example, you may have a block of SQL that PRINTs out the current privileges in the databasethat can then be saved off and used as an independent script.

In my case today, I need to collect information on the Service Pack and Cumulative Updates that need to be installed/applied to 200+ SQL Server instances. MS provides the SQL script to identify the right SP and CU to install. Please make sure you download the zip file and unzip the long SQL. I need to run this SQL against the instances to get the info. However, the information returned is completely with PRINT statements!

Click through to see how Jana did it.

Comments closed