Context Switches In SQL Server

Ewald Cress continues his journey to the center of the SQLOS:

The SQLOS scheduler exists in the cracks between user tasks. As we’re well aware, in order for scheduling to happen at all, it is necessary for tasks to run scheduler-friendly code every now and again. In practice this means either calling methods which have the side effect of checking your quantum mileage and yielding if needed, or explicitly yielding yourself when the guilt gets too much.

Now from the viewpoint of the user task, the experience of yielding is no different than the experience of calling any long-running CPU-intensive function: You call a function and it eventually returns. The real difference is that the CPU burned between the call and its return was spent on one or more other threads, while the current thread went lifeless for a bit. But you don’t know that, because you were asleep at the time!

Definitely read the whole thing.

Clippy Lives: In Scala

Akhil Vijayan explains Scala Clippy:

Now you may be wondering how these errors are identified and we get advice related to it.

Simple, these are provided by the Scala community. If you visit their official website Scala Clippy where you can find a tab “Contribute”. Under that, we can post our own errors. These errors are parsed first, and when successful we can add our advice which will be reviewed and if accepted it will be added to their database which will, in turn, be beneficial to others.

Take a close look at the screenshots; I missed it at first, but there’s helpful advice above the error message.

More On S3Guard

Aaron Fabbri describes how S3Guard works:

Although Apache Hadoop has support for using Amazon Simple Storage Service (S3) as a Hadoop filesystem, S3 behaves different than HDFS.  One of the key differences is in the level of consistency provided by the underlying filesystem.  Unlike HDFS, S3 is an eventually consistent filesystem.  This means that changes made to files on S3 may not be visible for some period of time.

Many Hadoop components, however, depend on HDFS consistency for correctness. While S3 usually appears to “work” with Hadoop, there are a number of failures that do sometimes occur due to inconsistency:

  • FileNotFoundExceptions. Processes that write data to a directory and then list that directory may fail when the data they wrote is not visible in the listing.  This is a big problem with Spark, for example.

  • Flaky test runs that “usually” work. For example, our root directory integration tests for Hadoop’s S3A connector occasionally fail due to eventual consistency. This is due to assertions about the directory contents failing. These failures occur more frequently when we run tests in parallel, increasing stress on the S3 service and making delayed visibility more common.

  • Missing data that is silently dropped. Multi-step Hadoop jobs that depend on output of previous jobs may silently omit some data. This omission happens when a job chooses which files to consume based on a directory listing, which may not include recently-written items.

Worth reading if you’re looking at using S3 to store data for Hadoop.  Also check out an earlier post on the topic.

Rebuilding Versus Reorganizing Rowstore Indexes

Paul Randal explains the difference between rebuilding and reorganizing rowstore indexes:

Rebuilding an index requires building a new index before dropping the old index, regardless of the amount of fragmentation present in the old index. This means you need to have enough free space to accommodate the new index.

Reorganizing an index first squishes the index rows together to try to deallocate some index pages, and then shuffles the remaining pages in place to make their physical (allocation) order the same as the logical (key) order. This only requires a single 8-KB page, as a temporary storage for pages being moved around. So an index reorganize is extremely space efficient, and is one of the reasons I wrote the original DBCC INDEXDEFRAG for SQL Server 2000 (the predecessor of ALTER INDEX … REORGANIZE).

If you have space constraints, and can’t make use of single-partition rebuild, reorganizing is the way to go.

Click through for the rest of the story.

Auto-Install Docker And SQL Tools On Linux

Andrew Pruski has a script on GitHub:

So I’ve created a repository on GitHub that pulls together the code from Docker to install the Community Edition and the code from Microsoft to install the SQL command line tools.

The steps it performs are: –

  • Installs the Docker Community Edition

  • Installs the SQL Server command line tools

  • Pulls the latest SQL Server on Linux image from the Docker Hub

Read on for more details and some limitations.

SQL Server 2017 RC2

Microsoft has announced Release Candidate 2 of SQL Server 2017, hot on the heels of RC1:

Microsoft is pleased to announce availability of SQL Server 2017 Release Candidate 2 (RC2), which is now available for download.

The release candidate represents an important milestone for SQL Server.  Development of the new version of SQL Server along most dimensions needed to bring the industry-leading performance and security of SQL Server to Windows, Linux, and Docker containers is complete.  We are continuing to work on performance and stress testing of SQL Server 2017 to get it ready for your most demanding Tier 1 workloads, as well as some final bug fixes.

There are no new features and the Windows release notes are empty, but there are some Linux release notes as they firm up that offering before launch.

Web Editor For Azure Analysis Services

James Serra shows off a preview of the Azure Analysis Services web designer for tabular models:

Microsoft has released a preview of the Azure Analysis Services web designer.  This is a browser-based experience that will allow developers to start creating and managing Azure Analysis Services (AAS) semantic models quickly and easily.  SQL Server Data Tools (SSDT) and SQL Server Management Studio (SSMS) will still be the primary tools for development, but this new designer gives you another option for creating a new model or to do things such as adding a new measure to a development or production AAS model.

A highly requested feature is that you can import a Power BI Desktop file (.pbix) into an Analysis Services database.  And once imported you can reverse engineer to Visual Studio.  Note for PBIX import only Azure SQL Database, Azure SQL Data warehouse, Oracle, and Teradata are supported at this time and Direct Query models are not yet supported for import (Microsoft will be adding new connection types for import every month).

Read on for more details.

Calculating Pi In SQL Server

Slava  Murygin has fun calculating Pi in SQL Server:

It goes like:
N1 BaseType : numeric
N1 Precision : 10
N1 Scale : 0

Interesting to know. SQL Server interprets integers higher than 2,147,483,647 as Numeric, not as BIGINT !!!

Slava digs up a couple T-SQL edge cases worth noting.

Conditional Counts

Mark Broadbent shows that COUNT has a few tricks up its sleeve:

When I came to compare the results against aggregated data that I had, I noticed that the values were off and it became fairly obvious that the transactional data also contained refunds and rebates (positive values but logically reflected as negative by the Transaction_Type status) and these were not just causing inaccuracies for the SUM on Sales_Value, but were also causing the COUNT for Number_Of_Sales to be wrong. In other words, refunds and rebates must be removed from the SUM total and not aggregated in the Number_Of_Sales columns. Now at this stage, you might be thinking that we can do this by a simple WHERE clause to filter them from the aggregates, but not only is it wrong to “throw away” data, I realised that my target tables also contained aggregate columns for refunds and rebates.

I have only used the SUM(CASE) method that Mark shows.  It’s interesting that COUNT(CASE) can work, but I agree that it is probably more confusing, if only because it’s so rare.


August 2017
« Jul