Press "Enter" to skip to content

Author: Kevin Feasel

Reading Query Plans in Spark

Daniel Ciocirlan has a primer on query plans in Apache Spark:

Let’s go over some examples of query plans and how to read them. Let’s go back to the one we’ve just shown:

 == Physical Plan == *(1) Project [(id#0L * 5) AS id#2L]

+- *(1) Range (1, 1000000, step=1, splits=6)

We read this plan backwards, bottom to top:

Spark does have some UI components which make this a bit easier, but you’ll probably end up in a situation where you need to read it in this format.

Comments closed

Availability Group Offerings in Standard Edition

Guy Glantser notes an issue with Availability Group documentation:

In SQL Server 2017 Microsoft added a new flavor called Read-Scale Availability Groups. This is different, because the goal here is not high availability or disaster recovery, but rather read-scalability. As opposed to the other flavors, in RSAG there is no cluster, and there is also no automatic failover mechanism. But you can set up multiple secondary replicas with read-only access and load balancing, and offload read workloads from the primary replica. This is a great scalability feature, and you can read more about it here.

Now, if you check Microsoft documentation regarding the editions and supported features of SQL Server, you will be happy to see that RSAG is supported in Standard Edition. I was happy to see it too. Unfortunately, if you try to set up a Read-Scale Availability Group on Standard Edition, it will not work. You will only be able to create a Basic Availability Group, as discussed earlier.

Click through for the answer, as well as what you can do in Standard Edition.

Comments closed

The Benefits of Indirect Checkpoints

Aaron Bertrand has success with using indirect checkpoints:

I was a bit perplexed by this issue, since the system was certainly no slouch — plenty of cores, 3TB of memory, and XtremIO storage. And none of these FlushCache messages were ever paired with the 15 second I/O warnings in the error log. Still, if you stack a bunch of high-transaction databases on there, checkpoint processing can get pretty sluggish. Not so much because of the direct I/O, but more reconciliation that has to be done with a massive number of dirty pages (not just from committed transactions) scattered across such a large amount of memory, and potentially waiting on the lazywriter (since there is only one for the whole instance).

Read on for several links and the results of Aaron’s testing.

Comments closed

A Naming System for Schedules

Daniel Janik shares a naming scheme for schedules in systems like SQL Agent and Azure Data Factory:

This tip comes from my DBA days working with SQL Agent Job schedules. If you’ve ever worked on a server where many people created job schedules you’ll know exactly what I mean when I say the schedule names can be really annoying.

This is because the names are not meaningful at all. They are either a GUID thanks to SSRS or something useless like “Schedule 1” or you have 6 different versions of “Every 5 min” when the schedule actually only runs every 15 min on Mondays.

The Linux nerd in me says “Could’ve just used cron naming.” I think Daniel’s naming scheme takes a little bit of time to get used to, but it makes sense.

Comments closed

Queries Which Query Store Won’t Capture

Jeff Iannucci takes us through queries which Query Store doesn’t capture:

But what if you wanted to use Query Store for a different reason in a migration? What if to see which stored procedures are actually being used, and which are just sitting there unused like that box of commercial toilet paper you panic purchased last month?

What if you set the capture mode to “All”, but then noticed you weren’t actually capturing All of the queries in your database?

Click through for the full report.

Comments closed

I Remember Halloween

Paul White talks about the Halloween Problem:

Much has been written over the years about understanding and optimizing SELECT queries, but rather less about data modification. This series looks at an issue that is specific to INSERTUPDATEDELETE and MERGE queries – the Halloween Problem.

The phrase “Halloween Problem” was originally coined with reference to a SQL UPDATE query that was supposed to give a 10% raise to every employee who earned less than $25,000. The problem was that the query kept giving 10% raises until everyone earned at least $25,000.

We will see later on in this series that the underlying issue also applies to INSERTDELETE and MERGE queries, but for this first entry, it will be helpful to examine the UPDATE problem in a bit of detail.

This is a classic problem in data management and has led to a good bit of confusion over time about why database updates can perform worse than you’d expect.

Comments closed

Visualizing Ranking Data

Stephanie Evergreen gives us a few techniques for visualizing ranking data:

And any time your data could be visualized in a bar chart, you can always take a jump to a dot plot or lollipop chart. You got this.

Any of these variations will be a perfectly fine visual to show rank data at a single point in time. If you have rank over time OR rank comparison across multiple groups, try a Bump Chart.

I was going to recommend a Cleveland dot plot, myself.

Comments closed

Color Band by Group in Power BI

Marco Russo and Alberto Ferrari show how we can change color alteration to switch from row to row and instead go from group to group:

The background color of the rows depends on Sales[Order Number]. The background color switches between white and light gray every time the order number changes, so all the rows of the same order have the same background color and can be easily identified. You cannot obtain this visualization by only using a Power BI style, because the coloring of a row depends on the actual data in it. You can achieve this goal by using the conditional formatting feature in Power BI. You can set the background color of a cell according to the value of a measure. Therefore, you need a DAX formula that returns two values: one for the white rows and one for the gray rows. The value returned by the measure must alternate between those two values with each consecutive order number.

Read on for an example of how you can do this.

Comments closed

Using Powershell to Configure Database Mail and SQL Agent Alerts

Eric Cobb shows us how to use Powershell to set up database mail and SQL Agent alerts:

As a DBA, you need to know when there’s a problem on your SQL Servers. And while I highly recommend you use a full-fledged monitoring system, there are also some things you can set up on your SQL Servers so that they will tell you when certain things go wrong. This doesn’t replace a full monitoring system, but setting up the below alerts will give you notification when SQL Server encounters things like corruption or resource issues.

Even with a full-fledged monitoring system, there are places where you can still make use of mail and side alerts.

Comments closed

Foreign Keys and Non-Changing Updates

Brent Ozar has a warning for us:

If you update a row without actually changing its contents, does it still hurt?

Paul White wrote in detail about the impact of non-updating updates, proving that SQL Server works hard to avoid doing extra work where it can. That’s a great post, and you should read it.

But foreign keys add another level of complexity. If you have foreign keys, and you tell SQL Server that you’re updating the contents of those related columns, SQL Server will check ’em even when the data isn’t changing.

Click through for the demonstration. I don’t think I agree with Brent’s dichotomy as laid out at the end of the post—the back-and-forth about removing keys would only make sense if you’re on the edge of the database equivalent of the production possibility frontier and expecting to move well beyond that point very soon. I’m not sure how well that describes the average company, but it’s a side quibble.

Comments closed