Press "Enter" to skip to content

Curated SQL Posts

R 3.6.3 Now Available

David Smith takes a look at R 3.6.3:

On February 29, R 3.6.3 was released and is now available for Windows, Linux and Mac systems. This update, codenamed “Holding the Windsock“, fixes a few minor bugs, and as a minor update maintains compatibility with scripts and packages written for prior versions of R 3.6. 

February 29 is an auspicious date, because that was the day that R 1.0.0 was released to the world: February 29, 2000. In the video below from the CelebRation2020 conference marking the 20th anniversary of R, core member Peter Dalgaard reflects on the origins of R, and releases R 3.6.3 live on stage (at the 33-minute mark).

I’m holding out for R 4, though then I’ll have to wait to see when SQL Server will officially support it.

Comments closed

Adoption Patterns with Query Store

Erin Stellato has some thoughts on Query Store adoption:

Last fall we had a previous customer reach out for help after they had to wait 45 minutes for a database to come online after a server reboot. The database queries were blocked by QDS_LOADDB waits. There were three things in play here – the first was that they had CAPTURE_MODE set to ALL, and it should be AUTO. Second, they didn’t have trace flag 7752 enabled (the behavior of which is now the default in SQL Server 2019). And the third was that their Query Store was 100GB in size. The workload was fairly ad hoc, so these three things together caused the problem initially described. They implemented the TF, made multiple changes to the settings (set CAPTURE_MODE to AUTO, changed MAX_STORAGE_SIZE_MB to 10GB, decreased CLEANUP_POLICY to 3 days), and then Query Store was usable for them.

Read on for more examples.

Comments closed

Using Query Store to Replay Workloads

John Sterrett shows us an unorthodox use of Query Store:

Today, I wanted to talk about my least favorite part of replaying workloads. It’s having an extended event or server-side trace running during a workload replay only so we can compare the results at a query-level when the replay is finished. Now, this might seem like a trivial thing but when you have workloads over 10k batch requests/sec this can consume terabytes of data quickly. The worst part is waiting to read all the data, slice and dice the data for analysis.

Starting with SQL Server 2016 there is a better and faster way to go! You can replace your extended event or server-side trace with Query Store captured data. Today, I will show you how to use the Query Store for the same purpose.

Click through for the solution.

Comments closed

Finding Query Store’s Server Impact

Tracy Boggiano shows us how to track the performance impact of Query Store on an environment:

This month’s T-SQL Tuesday blogging party is brought to you well by me and I wanted to talk more about Query Store.  I did write a book on it but there if still more to know about that is not in the book.  I am sure the rest of everyone’s posts will prove enlightening and provide valuable content for folks using or looking to implement Query Store.  Someone should have told Grant to hold off a week on his post about DROP / CREATE of procedures and what happens with plan forcing so it could officially be part of the party.

I frequently get asked while presenting about the impact of running Query Store on the instance and one thing that was not in the book was the performance counters that were added to help track just that.

You should probably buy a bunch of copies of Tracy’s book. Just in case.

Comments closed

Query Store and Spinlocks

David Fowler takes a look at an issue with Query Store:

We moseyed on down to the server in question to take a look at it.  One thing stood out immediately, CPU was pegged out at 100% but SQL itself didn’t actually seem to be doing anything, transactionssecond was on the floor. Unfortunately this happened a while back and I didn’t think to capture any graphs or metrics at the time so you’re just going to have to take my word for this.

After looking into a few different things, the mention of spinlock contention came up.  I’ll be honest here, actual spinlock contention is rare and is something that I’ve seen cause an issue only a handful of times so it’s something that I don’t generally get to until I’ve ruled out just about everything else.

David’s scenario was on an older patch of SQL Server and it was fixed later. It’s a good reminder to keep those servers patched.

Comments closed

Sqoop Scheduling and Security

Jon Moirsi continues a series on Sqoop:

In previous articles, I’ve walk through using Sqoop to import data to HDFS.  I’ve also detailed how to perform full and incremental imports to Hive external and Hive managed tables.

In this article I’m going to show you how to automate execution of Sqoop jobs via Cron.

However, before we get to scheduling we need to address security.  In prior examples I’ve used -P to prompt the user for login credentials interactively.  With a scheduled job, this isn’t going to work.  Fortunately Sqoop provides us with the “password-alias” arg which allows us to pass in passwords stored in a protected keystore.

That particular keystore tie-in works quite smoothly in my experience.

Comments closed

Monitoring Data Quality on Streaming Data

Abraham Pabbathi and Greg Wood want to check data quality on Spark Streaming data:

While the emergence of streaming in the mainstream is a net positive, there is some baggage that comes along with this architecture. In particular, there has historically been a tradeoff: high-quality data, or high-velocity data? In reality, this is not a valid question; quality must be coupled to velocity for all practical means — to achieve high velocity, we need high quality data. After all, low quality at high velocity will require reprocessing, often in batch; low velocity at high quality, on the other hand, fails to meet the needs of many modern problems. As more companies adopt streaming as a lynchpin for their processing architectures, both velocity and quality must improve.

In this blog post, we’ll dive into one data management architecture that can be used to combat corrupt or bad data in streams by proactively monitoring and analyzing data as it arrives without causing bottlenecks.

This was one of the sticking points of the lambda architecture: new data could still be incomplete and possibly wrong, but until reached the batch layer, you wouldn’t know that.

Comments closed

Two Query Store Stories

Mark Wilkinson gives us two separate takes on Query Store:

When the Query Data Store (QDS) feature was announced for SQL Server 2016, we were excited about the prospect of being able to have deep insight on any query running in our environment. I work for a company that deals heavily in the e-commerce space, and we have a large SQL Server footprint. Our environment is unique in that is essentially a multi-tenant system, but all the tenants could have wildly different workloads. It’s really the kind of query execution scenario QDS was built for. We had the pleasure of working with the Microsoft SQLCAT team to get 2016 and QDS up and running in our production environment before it was GA.

In this post I’m going to share two stories about our QDS experience (from pre and post GA of the feature). One from the perspective of the Database Developer, and one from the Database Administrator. For the most part this is not a technical post full of queries and code samples. It’s just me talking about some things I have experienced using QDS in production.

CentralQDS, by the way, is really cool. Hopefully we’re able to show that to the world someday (and note how I say “we” even though I did absolutely nothing with it except for being in the same company as the people who developed it).

Comments closed

Improving vCenter Performance Metric Logging

David Klee has some recommendations on settings for vCenter performance metric collection:

The default performance metric statistics collection interval within vCenter is to start rolling up data in an aggregation method starting at just one hour. Much of the data necessary for troubleshooting of performance challenges reported either same day or on the previously day is lost from the vCenter data and forces the administrator to revert to cumbersome and/or time-consuming tooling, such as vRealize Operations Manager. DBAs might not have access to such tools. Hopefully by now they have read-only access to vCenter!

The vCenter performance statistics collection and rollup settings can be customized to provide a longer window of time for critical metrics to be available to the administrator for management.

Click through for some recommendations of aggregation intervals and collection durations to help with virtual machine troubleshooting.

Comments closed

Loading Data from CSVs with Inconsistent Quoted Identifiers

Dave Mason has some fun with loading data from files:

BCP and OPENROWSET are long-lived SQL Server options for working with data in external files. I’ve blogged about OPENROWSET, including a recent article showing a way to deal with quoted data. One of the shortcomings I’ve never been able to overcome is an inconsistent data file with data fields in some rows enclosed in double quotes, but not all. I’ve never found a way around this limitation.

Let’s demonstrate with BCP. Below is a sample data file I’ll attempt to load into a SQL Server table. Note the data fields highlighted in yellow, which are enclosed in double quotes and contain the field terminator , (comma) character. For reference, the file is also available on Github.

I get unduly frustrated with the implementations of various data loaders around SQL Server and how they handle quoted identifiers differently. And don’t get me started on PolyBase.

Comments closed