Press "Enter" to skip to content

Curated SQL Posts

Left Versus Right Joins

Denis Gobo doesn’t like RIGHT JOIN:

Do you use RIGHT JOINs? I myself rarely use a RIGHT JOIN, I think in the last 17 years or so I have only used a RIGHT JOIN once or twice. I think that RIGHT JOINs confuse people who are new to databases, everything that you can do with a RIGHT JOIN, you can also do with a LEFT JOIN, you just have to flip the query around

So why did I use a RIGHT JOIN then?

Don’t be lazy; switch out those right joins.  The trick is that for every RIGHT JOIN statement, there is an equivalent statement which does not use RIGHT JOIN.  The percentage of the time that you might benefit from RIGHT JOIN is so low that the fixed costs of mentally processing what’s going on tend to overwhelm the slight benefit of that style of join.

Comments closed

Going Back From The Cloud

Arun Sirpal notes that you can take a cloud database back to on-premises:

The Challenge: I am going to write about a way to move from Azure SQL Database (Platform as a service) back to a local SQL Server. I did encounter errors on the way but more importantly I have written how to avoid/solve them.

Another key point I made sure that there were no connections to the database when doing the below as I didn’t want in-flight data movement whilst doing it. If you can’t do this, then you should create a copy of the database and work from that.

It’s not a trivial operation, but Arun does walk us through the steps.

Comments closed

ARITHABORT And ANSI_WARNINGS

Shane O’Neill looks at what the ARITHABORT and ANSI_WARNINGS settings do in SQL Server:

So, like a dog when it sees a squirrel, when I found out about the problems with ARITHABORT and ANSI_WARNINGS I got distracted and started checking out what else I could break with it. Reading through the docs, because I found that it does help even if I have to force myself to do it sometimes, I found a little gem that I wanted to try and replicate. So here’s a reason why you should care about setting ARITHABORT and ANSI_WARNINGS on.

These are two settings where the default value makes a lot of sense.

Comments closed

Context Switches In SQL Server

Ewald Cress continues his journey to the center of the SQLOS:

The SQLOS scheduler exists in the cracks between user tasks. As we’re well aware, in order for scheduling to happen at all, it is necessary for tasks to run scheduler-friendly code every now and again. In practice this means either calling methods which have the side effect of checking your quantum mileage and yielding if needed, or explicitly yielding yourself when the guilt gets too much.

Now from the viewpoint of the user task, the experience of yielding is no different than the experience of calling any long-running CPU-intensive function: You call a function and it eventually returns. The real difference is that the CPU burned between the call and its return was spent on one or more other threads, while the current thread went lifeless for a bit. But you don’t know that, because you were asleep at the time!

Definitely read the whole thing.

Comments closed

Clippy Lives: In Scala

Akhil Vijayan explains Scala Clippy:

Now you may be wondering how these errors are identified and we get advice related to it.

Simple, these are provided by the Scala community. If you visit their official website Scala Clippy where you can find a tab “Contribute”. Under that, we can post our own errors. These errors are parsed first, and when successful we can add our advice which will be reviewed and if accepted it will be added to their database which will, in turn, be beneficial to others.

Take a close look at the screenshots; I missed it at first, but there’s helpful advice above the error message.

Comments closed

More On S3Guard

Aaron Fabbri describes how S3Guard works:

Although Apache Hadoop has support for using Amazon Simple Storage Service (S3) as a Hadoop filesystem, S3 behaves different than HDFS.  One of the key differences is in the level of consistency provided by the underlying filesystem.  Unlike HDFS, S3 is an eventually consistent filesystem.  This means that changes made to files on S3 may not be visible for some period of time.

Many Hadoop components, however, depend on HDFS consistency for correctness. While S3 usually appears to “work” with Hadoop, there are a number of failures that do sometimes occur due to inconsistency:

  • FileNotFoundExceptions. Processes that write data to a directory and then list that directory may fail when the data they wrote is not visible in the listing.  This is a big problem with Spark, for example.

  • Flaky test runs that “usually” work. For example, our root directory integration tests for Hadoop’s S3A connector occasionally fail due to eventual consistency. This is due to assertions about the directory contents failing. These failures occur more frequently when we run tests in parallel, increasing stress on the S3 service and making delayed visibility more common.

  • Missing data that is silently dropped. Multi-step Hadoop jobs that depend on output of previous jobs may silently omit some data. This omission happens when a job chooses which files to consume based on a directory listing, which may not include recently-written items.

Worth reading if you’re looking at using S3 to store data for Hadoop.  Also check out an earlier post on the topic.

Comments closed

Rebuilding Versus Reorganizing Rowstore Indexes

Paul Randal explains the difference between rebuilding and reorganizing rowstore indexes:

Rebuilding an index requires building a new index before dropping the old index, regardless of the amount of fragmentation present in the old index. This means you need to have enough free space to accommodate the new index.

Reorganizing an index first squishes the index rows together to try to deallocate some index pages, and then shuffles the remaining pages in place to make their physical (allocation) order the same as the logical (key) order. This only requires a single 8-KB page, as a temporary storage for pages being moved around. So an index reorganize is extremely space efficient, and is one of the reasons I wrote the original DBCC INDEXDEFRAG for SQL Server 2000 (the predecessor of ALTER INDEX … REORGANIZE).

If you have space constraints, and can’t make use of single-partition rebuild, reorganizing is the way to go.

Click through for the rest of the story.

Comments closed

Auto-Install Docker And SQL Tools On Linux

Andrew Pruski has a script on GitHub:

So I’ve created a repository on GitHub that pulls together the code from Docker to install the Community Edition and the code from Microsoft to install the SQL command line tools.

The steps it performs are: –

  • Installs the Docker Community Edition

  • Installs the SQL Server command line tools

  • Pulls the latest SQL Server on Linux image from the Docker Hub

Read on for more details and some limitations.

Comments closed

SQL Server 2017 RC2

Microsoft has announced Release Candidate 2 of SQL Server 2017, hot on the heels of RC1:

Microsoft is pleased to announce availability of SQL Server 2017 Release Candidate 2 (RC2), which is now available for download.

The release candidate represents an important milestone for SQL Server.  Development of the new version of SQL Server along most dimensions needed to bring the industry-leading performance and security of SQL Server to Windows, Linux, and Docker containers is complete.  We are continuing to work on performance and stress testing of SQL Server 2017 to get it ready for your most demanding Tier 1 workloads, as well as some final bug fixes.

There are no new features and the Windows release notes are empty, but there are some Linux release notes as they firm up that offering before launch.

Comments closed