Press "Enter" to skip to content

Curated SQL Posts

Reading Query Plans in Spark

Daniel Ciocirlan has a primer on query plans in Apache Spark:

Let’s go over some examples of query plans and how to read them. Let’s go back to the one we’ve just shown:

 == Physical Plan == *(1) Project [(id#0L * 5) AS id#2L]

+- *(1) Range (1, 1000000, step=1, splits=6)

We read this plan backwards, bottom to top:

Spark does have some UI components which make this a bit easier, but you’ll probably end up in a situation where you need to read it in this format.

Comments closed

Database Integrity in Cloudera Data Platform

Gokul Kamaraj and Liliana Kadar continue a series on operational database tooling in Hadoop:

Referential integrity is supported through the implementation of ‘constraints’ as well as enforcing business rules for attributes in the table. 

Constraints are configurable, and you can use it across different tables. Keep in mind that you have to choose a behavior depending on the specific configuration given to that constraint. 

This is rather underdeveloped compared to relational database platforms, but it’s still an improvement over the olden days, in which referential integrity was “write code which does that after the fact.”

Comments closed

Availability Group Offerings in Standard Edition

Guy Glantser notes an issue with Availability Group documentation:

In SQL Server 2017 Microsoft added a new flavor called Read-Scale Availability Groups. This is different, because the goal here is not high availability or disaster recovery, but rather read-scalability. As opposed to the other flavors, in RSAG there is no cluster, and there is also no automatic failover mechanism. But you can set up multiple secondary replicas with read-only access and load balancing, and offload read workloads from the primary replica. This is a great scalability feature, and you can read more about it here.

Now, if you check Microsoft documentation regarding the editions and supported features of SQL Server, you will be happy to see that RSAG is supported in Standard Edition. I was happy to see it too. Unfortunately, if you try to set up a Read-Scale Availability Group on Standard Edition, it will not work. You will only be able to create a Basic Availability Group, as discussed earlier.

Click through for the answer, as well as what you can do in Standard Edition.

Comments closed

A Naming System for Schedules

Daniel Janik shares a naming scheme for schedules in systems like SQL Agent and Azure Data Factory:

This tip comes from my DBA days working with SQL Agent Job schedules. If you’ve ever worked on a server where many people created job schedules you’ll know exactly what I mean when I say the schedule names can be really annoying.

This is because the names are not meaningful at all. They are either a GUID thanks to SSRS or something useless like “Schedule 1” or you have 6 different versions of “Every 5 min” when the schedule actually only runs every 15 min on Mondays.

The Linux nerd in me says “Could’ve just used cron naming.” I think Daniel’s naming scheme takes a little bit of time to get used to, but it makes sense.

Comments closed

The Benefits of Indirect Checkpoints

Aaron Bertrand has success with using indirect checkpoints:

I was a bit perplexed by this issue, since the system was certainly no slouch — plenty of cores, 3TB of memory, and XtremIO storage. And none of these FlushCache messages were ever paired with the 15 second I/O warnings in the error log. Still, if you stack a bunch of high-transaction databases on there, checkpoint processing can get pretty sluggish. Not so much because of the direct I/O, but more reconciliation that has to be done with a massive number of dirty pages (not just from committed transactions) scattered across such a large amount of memory, and potentially waiting on the lazywriter (since there is only one for the whole instance).

Read on for several links and the results of Aaron’s testing.

Comments closed

I Remember Halloween

Paul White talks about the Halloween Problem:

Much has been written over the years about understanding and optimizing SELECT queries, but rather less about data modification. This series looks at an issue that is specific to INSERTUPDATEDELETE and MERGE queries – the Halloween Problem.

The phrase “Halloween Problem” was originally coined with reference to a SQL UPDATE query that was supposed to give a 10% raise to every employee who earned less than $25,000. The problem was that the query kept giving 10% raises until everyone earned at least $25,000.

We will see later on in this series that the underlying issue also applies to INSERTDELETE and MERGE queries, but for this first entry, it will be helpful to examine the UPDATE problem in a bit of detail.

This is a classic problem in data management and has led to a good bit of confusion over time about why database updates can perform worse than you’d expect.

Comments closed

Queries Which Query Store Won’t Capture

Jeff Iannucci takes us through queries which Query Store doesn’t capture:

But what if you wanted to use Query Store for a different reason in a migration? What if to see which stored procedures are actually being used, and which are just sitting there unused like that box of commercial toilet paper you panic purchased last month?

What if you set the capture mode to “All”, but then noticed you weren’t actually capturing All of the queries in your database?

Click through for the full report.

Comments closed

Visualizing Ranking Data

Stephanie Evergreen gives us a few techniques for visualizing ranking data:

And any time your data could be visualized in a bar chart, you can always take a jump to a dot plot or lollipop chart. You got this.

Any of these variations will be a perfectly fine visual to show rank data at a single point in time. If you have rank over time OR rank comparison across multiple groups, try a Bump Chart.

I was going to recommend a Cleveland dot plot, myself.

Comments closed

Using Powershell to Configure Database Mail and SQL Agent Alerts

Eric Cobb shows us how to use Powershell to set up database mail and SQL Agent alerts:

As a DBA, you need to know when there’s a problem on your SQL Servers. And while I highly recommend you use a full-fledged monitoring system, there are also some things you can set up on your SQL Servers so that they will tell you when certain things go wrong. This doesn’t replace a full monitoring system, but setting up the below alerts will give you notification when SQL Server encounters things like corruption or resource issues.

Even with a full-fledged monitoring system, there are places where you can still make use of mail and side alerts.

Comments closed

Color Band by Group in Power BI

Marco Russo and Alberto Ferrari show how we can change color alteration to switch from row to row and instead go from group to group:

The background color of the rows depends on Sales[Order Number]. The background color switches between white and light gray every time the order number changes, so all the rows of the same order have the same background color and can be easily identified. You cannot obtain this visualization by only using a Power BI style, because the coloring of a row depends on the actual data in it. You can achieve this goal by using the conditional formatting feature in Power BI. You can set the background color of a cell according to the value of a measure. Therefore, you need a DAX formula that returns two values: one for the white rows and one for the gray rows. The value returned by the measure must alternate between those two values with each consecutive order number.

Read on for an example of how you can do this.

Comments closed