Press "Enter" to skip to content

Curated SQL Posts

The Pain of OR Clauses

Erik Darling wants you to embrace the healing power of AND:

This is one of my least favorite query patterns, because even with appropriate indexes, performance often isn’t very good without additional interventions.

Without indexes in place, or when “indexes aren’t used”, then the query plans will often look like one of these.

Maybe not always, but there are pretty common.

It’s something that I do wish the optimizer could be smarter about. One important thing to note in Erik’s demo: the OR clause is on two different columns, so SELECT x.Col1 FROM dbo.TblX x WHERE x.ID = 8 OR x.ID = 7 works fine, but WHERE x.ID = 8 OR x.SomethingElse = 14 is liable to cause performance issues on a large enough table.

Comments closed

Restoring a TDE Database to a Different Server

Chad Callihan reminds us of the other half of backups:

You setup TDE for your database. Great! Now you need to restore that database to a different server. Maybe you’re migrating off of an older server or maybe there is testing to be completed for an upcoming release. If we try to backup/restore or attach the database on a different server we’ll get some variation of an error stating “cannot find server certificate with thumbprint…” and the process will fail. There are a few steps to get the TDE database restored and we’re going to walk through them today.

Read on for those instructions.

Comments closed

Finding Spinlock Owners in Dump Files

Forrest McDaniel leaves us with a head injury:

Now, I may not be an expert, but sqlmin!Spinlock sounds like…a spinlock. This thread has been spinning for over a minute, never returning to a waiting state, because something else is holding the spinlock resource.

Thankfully, helpful friends alerted me to a blog that revealed the value of an acquired spinlock “is the Windows thread ID of the owner.” Meaning I might be able to find the cause.

Read the whole thing.

Comments closed

IDENTITY Overflow in SSIS

Alex Stuart hits a weird error:

Conversion/overflow errors aren’t that unusual – normally a data flow broken by some unexpected data (“no, there’s no chance that field would ever have a character in it”), or perhaps a column hitting max size (“INT will be enough for years, like, 5 years. I’ll have left the company by then”)

But that wasn’t the case here – the package and user tables involved were checked by the dev team and there was no possible overflow. I’d checked system databases for maxed-out identity columns and found nothing. Heads were scratched.

Read on for the post-head-scratch answer.

Comments closed

Automating Montoring

Thomas Williams has a multi-part series. Part 1 is a big picture primer:

It would be nice if every problem, now and in the future, was automatically monitored by an intuitive, cheap, flexible tool that only raised the alarm when there was a real problem, at the right time, to the right people.

However, in reality, monitoring is a game of whack-a-mole, built on an ad-hoc collection of tools and technologies, requiring know-how to interpret the often puzzling situations which may be high priority – or just background noise.

Part 2 continues the big picture look:

Monitoring is here to stay. There’s never enough disk space, memory, CPU, network throughput. If they ever existed, perfect conditions don’t persist, whether because of a downstream system failing, newly-discovered security issue, change in process, extraordinary amount of load etc. And, systems grow and change to meet new requirements, so yesterday’s monitoring may not meet tomorrow’s uptime goals.

Following on from part 1, here are some further “big picture” considerations for automated monitoring for the DBA:

Part 3 gets to ideas on instrumentation:

With some of the broader ideas out of the way, if there’s no monitoring in place and I was the “accidental”/”default” DBA, there are some metrics I’d definitely want to monitor. However, there’s a ton of real-time metrics I haven’t included below like CPU & memory use, number of current connections, disk I/O as at right now. They’re probably not good candidates for “roll-your-own”-type monitoring.

I consider “alerts” in the table below to include an e-mail, SMS or dashboard item that aims to trigger a response to fix. The collection frequency especially is just a guide – closer to real-time might be desirable, but the trade-off is impact on servers and connected apps.

Check out all three posts so far in the series.

Comments closed

Ordering and Sorting Data in Spark

Landon Robinson shows how to sort data in Spark RDDs and DataFrames:

In the analysis section of Spark Starter Guide 4.6: How to Aggregate Data, we asked these questions: “Who is the youngest cat in the data? Who is the oldest?”

Let’s use ordering in Spark as an alternative method to answer those same questions, and achieve the same result. Specifically, let’s again find the youngest and oldest cats in the data.

Click through for plenty of examples.

Comments closed

Time Series Data in PostgreSQL

Michael Grogan has a few examples of working with time series data in PostgreSQL:

Tools such as Python or R are most often used to conduct deep time series analysis.

However, knowledge of how to work with time series data using SQL is essential, particularly when working with very large datasets or data that is constantly being updated.

Here are some useful commands that can be invoked in SQL to better work with time series data within the data table itself.

Click through for examples like using a window function to calculate moving averages and using time zones. H/T Mark Hutchinson.

Comments closed

THROW and Linked Servers

Chad Baldwin hits on an interesting result when using THROW across a linked server:

The THROW command is non-terminating if it is used in a stored procedure over a linked-server.

I don’t know the details to why it works this way. The THROW command returns an error message with a severity level of 16, which, according to my RAISERROR Cheatsheet, does not stop execution.

There’s something special about the THROW command beyond raising an error message. Behind the scenes, there is likely some extra information being passed to tell SQL Server that execution needs to stop in that moment, and that extra bit of information does not appear to be passed between linked servers.

Click through for a demo.

Comments closed

Power BI Content Certification and Promotion

Melissa Coates talks about certification (though not certifications):

The number of people who are allowed to certify content should be pretty limited, and should only include people who are knowledgeable about both the data and the certification requirements. Ideally, certification should only be done by domain/subject matter experts — and this isn’t usually your Power BI administrator. In larger organizations, you might reference multiple Azure AD security groups in the tenant setting – but it should remain a pretty limited number of people to ensure that the certification designation is truly meaningful. (Note that we currently cannot specify who can endorse content on a per-workspace level.)

Read on for more information about content certification, as well as content promotion, in Power BI.

Comments closed