Press "Enter" to skip to content

Curated SQL Posts

Trivial Plans With Columnstore

Dmitry Pilugin looks at columnstore query behavior when trivial plans get involved:

Both queries are now fully optimized and that lead to different plans. First of all, both queries run in a Batch Mode, which is much faster than a Row Mode.

In the first query, we see Hash Match Aggregate instead of Stream Aggregate, more to the point you may see that Actual Number of Rows is 0, because all the rows were aggregated locally at the Storage Engine level, you may see property Actual Number of Locally Aggregated Rows = 60855. This is faster than a regular aggregation and is known as Aggregate Pushdown.

In the second query, you may observe a new Window Aggregate operator which is faster than a Window Spool and runs in Batch Mode also.

Read the whole thing.  Dmitry also looks at SQL Server vNext and how it handles the same trivial-plan-generating scenario.

Comments closed

More On Columnstore String Handling

Niko Neugebauer focuses on two potential columnstore issues, NULL strings and string aggregation:

You can see a filter iterator taking estimated 9% of the resources and filtering over 5,1 million rows that were taken out of the lineitem_cci table (Columnstore Index Scan operation). Before it took place all of the table data was read and the values for the aggregate values were calculated, meaning a significant resource waste (when the data was filtered and while it was occupying memory).

The predicate properties of the Columnstore Index Scan are shown on the left picture and you can see that nothing has been pushed into the Storage Engine, making this query perform quite slow. On my local test VM it takes on average 0.7 seconds to execute while spending over 1.9 seconds of the CPU time. This happens because of the inability of the Columnstore Indexes to push NULL expression predicate into the storage engine.

To solve this problem in this test case I will need to rewrite the query and remove the NULL expression by substituting it by = ‘RAIL’ OR IS NULL logic:

Definitely worth a read.

Comments closed

Avoiding Hard-Coded Paths In Powershell

Jana Sattainathan gives a few options for getting around hard-coded paths in a Powershell script:

What if the KillerApp’s home folder suddenly moves?

Now, how do you make your app work with all its scripts without having to change code if you move it to a different folder?

You could now change the initial script that that dot-sources all the functions to alter the path and you are all set. This is still not ideal because you have to make a change when the location changed.

Click through for several options, including PSDrives and even automatically dot-sourcing Powershell scripts in the current and all child directories.

Comments closed

Batch Requests Per Second

Tara Kizer explains the Batch Requests/sec metric:

WHAT IS BATCH REQUESTS/SEC?

Batch Requests/sec is a performance counter that tells us the number of T-SQL command batches received by the server per second. It is in the SQLServer:SQL Statistics performance object for a default instance or MSSQL$InstanceName:SQL Statistics for a named instance.

WHAT COMPRISES A BATCH?

If I execute a stored procedure that has multiple queries and calls to other stored procedures, what will Batch Requests/sec show? Let’s test it to find out.

Click through for the answer.

Comments closed

Deadlock Priority

Kenneth Fisher tells a story of a deadlock:

Why does it matter that they were system sessions? The important thing to remember here is that these sessions can not be KILLed. So because they were holding locks on the database (And somehow even though it was in single user there were multiple sessions with locks in the database. Don’t ask me how.) I wasn’t able to get that exclusive access I needed.

Interestingly when I tried to do the ALTER instead of just hanging I immediately got a deadlock error. I spent a little while trying various things and searching through forums before I went for help on twitter using the #SQLHELP hashtag.

Read on for the answer, including how deadlock priorities saved the day.

Comments closed

Twitter Sentiment Analysis Using doc2vec

Sergey Bryl uses word2vec and doc2vec to perform Twitter sentiment analysis in R:

But doc2vec is a deep learning algorithm that draws context from phrases. It’s currently one of the best ways of sentiment classification for movie reviews. You can use the following method to analyze feedbacks, reviews, comments, and so on. And you can expect better results comparing to tweets analysis because they usually include lots of misspelling.

We’ll use tweets for this example because it’s pretty easy to get them via Twitter API. We only need to create an app on https://dev.twitter.com (My apps menu) and find an API Key, API secret, Access Token and Access Token Secret on Keys and Access Tokens menu tab.

Click through for more details, including code samples.

Comments closed

Plan Comparison

Max Vernon looks at plan comparison in SQL Server Management Studio:

The table has 10,000,000 rows. I’ve create a non-clustered columnstore index on the table, which I’ll talk about in a future post. I’ve included it here because it provides a succinct difference in the two plans.

To compare the plans visually, side-by-side, you need to save the first plan by right-clicking on the plan window, clicking “Save Execution Plan As…”, and specifying a filename. Next, right-click on the plan window, and choose “Compare Showplan”:

I’ve only used this once or twice, but it is an interesting feature.

Comments closed

Hadoop In The Cloud

Peter Coates talks about pros and cons to Hadoop in the cloud:

Hadoop was developed for deployment over Linux running on bare metal. Cloud deployment implies virtual machines, and for Hadoop it’s a huge difference.

As detailed in other articles (for instance, Your Cluster Is an Appliance or Understanding Hadoop Hardware Requirements), bare-metal deployments have an inherent advantage over virtual machine deployments. The biggest of these is that they can use direct attached storage, i.e., local disks.

Not every Hadoop workload is storage I/O bound, but most are, and even when Hadoop seems to be CPU bound, much of the CPU activity is often either directly in service of I/O, i.e., marshaling, unmarshaling, compression, etc., or in service of avoiding I/O, i.e., building in-memory tables for map-side joins.

Read the whole thing.

Comments closed