Curated SQL – Page 1450 – A Fine Slice Of SQL Server

Problem:

With the new install of Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-87-generic x86_64), different attempts at installing SQL Server resulted in the error: “E: Unable to locate package mssql-server”.

Read on for the solution.

Comments closed

How Columnstore Delta Stores Get Created

Published 2017-11-08 by Kevin Feasel

Joe Obbish has a list of ways that delta stores get created:

I briefly reviewed the documentation written by Microsoft concerning the appearance of delta stores. Here’s a quote:

Rows go to the deltastore when they are:
Inserted with the INSERT INTO VALUES statement.
At the end of a bulk load and they number less than 102,400.
Updated. Each update is implemented as a delete and an insert.

There are also a few mentions of how partitioning can lead to the creation of multiple delta stores from a single insert. It seems as if the document is incomplete or a little misleading, but I admit that I didn’t exhaustively review everything. After all, Microsoft hides columnstore documentation all over the place.

This is a great compendium of ways in which you can shoot yourself in the foot with clustered columnstore indexes.

Comments closed

Migrating DBMail Settings

Published 2017-11-08 by Kevin Feasel

Frank Gill has a T-SQL script to help with database mail migration:

This week, I was working on a migration for a client. The migration was moving databases from a stand-alone instance to a two-node Availability Group. When it came to moving the Database Mail settings, I discovered they had 21 sets of profiles and accounts. Not wanting to manually create 42 Database Mail profiles, I set out to automate the process. A web search yielded this blog post by Iain Elder. This script does what I was looking for, but would only generate settings for a single Database Mail profile. Using Iain’s code as a starting point, I modified it to create Database Mail settings for all profiles on an instance. The script is listed below. I hope this simplifies your SQL Server migrations.

Click through for Frank’s script, and you might also be interested in Iain Elder’s script, linked above.

Comments closed

How Per-Second AWS Billing Helps With Data Processing

Published 2017-11-07 by Kevin Feasel

Prakash Chockalingam explains how AWS per-second billing can make resource allocation easier:

Because of the hourly increments in billing, users spend a lot of time playing a giant game of Tetris with their big data workloads — figuring out how to pack jobs to use every minute of the compute hour. Examples:

If a job could be run on 10 nodes and finished in 20 minutes, it was better to run it on fewer nodes so that it takes around 50 minutes. As a result, you would pay less.

Running two 10 node jobs that took 20 minutes in parallel would cost two times more than running them sequentially.

The above problem got compounded if there were many such jobs to be run. To handle this challenge, many organizations turned to a resource scheduler like YARN. Organizations were following the traditional on-premises model of setting up one or more big multi-tenant cluster on the cloud and running YARN to bin-pack the different jobs.

Read on to see how this has changed as a result of per-second billing.

Comments closed

Tracking Kafka Consumer Lag

Published 2017-11-07 by Kevin Feasel

Simarpreet Kaur Monga has a Scala-based example showing how to calculate Kafka offset lag for consumers:

The Consumer can subscribe to multiple topics, you need to pass the list of topics you want to consume from. For the sake of simplicity, I have just passed a single topic to consume from.

Now that the consumer has subscribed to the topic, it can consume from that topic.

The consumer maintains an offset to keep the track of the next record it needs to read.

Now, let us see how we can find the consumer offsets.

The Consumer offsets can be found using the method offset of class ConsumerRecord. This offset points to the record in a Kafka partition. The consumer consumes the records from the topic in the form of an object of class ConsumerRecord. The class ConsumerRecord also consists of a topic name and a partition number from which the record is being received, and a timestamp as marked by the corresponding ProducerRecord (the record sent by the producer).

Click through for the rest of the story.

Comments closed

Soft-NUMA Doesn’t Limit MAXDOP

Published 2017-11-07 by Kevin Feasel

Lonny Niederstadt tests whether soft-NUMA forces MAXDOP = 1:

I mentioned that I was planning to set up a soft-NUMA node for each vcpu on a 16 vcpu VM, to evenly distribute incoming connections and thus DOP 1 queries over vcpus. Thomas Kejser et al used this strategy to good effect in “The Data Loading Performance Guide”, which used SQL Server 2008 as a base.
https://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx

My conversation partner cautioned me that leaving this soft-NUMA configuration in place after the specialized workload would result in DOP 1 queries whether I wanted them or not. The claim was, effectively, a parallel query plan generated by a connection within a soft-NUMA node would have its MAXDOP restricted by the scheduler count (if lower than other MAXDOP contributing factors). Though I wasn’t able to test at the time, I was skeptical: I’d always thought that soft-NUMA was consequential to connection placement, but not to MAXDOP nor to where parallel query workers would be assigned.

I’m back home now… time to test!!

Read on for the test.

Comments closed

Picking Azure VM Sizes

Published 2017-11-07 by Kevin Feasel

Glenn Berry helps us pick the right-sized Azure VM for a SQL Server installation:

A common issue with Azure VM sizing for SQL Server has been the fact that you were often forced to select a VM size that had far more virtual CPU cores than you needed or wanted in order to have enough memory and storage performance to support your workload, which increased your monthly licensing cost.

Luckily, Microsoft has recently made the decision process a little easier for SQL Server with a new series of Azure VMs that use some particular VM sizes (DS, ES, GS, and MS), but reduce the vCPU count to one quarter or one half of the original VM size, while maintaining the same memory, storage and I/O bandwidth. These these new VM sizes have a suffix that specifies the number of active vCPUs to make them easier to identify.

For example, a Standard_DS14v2 Azure VM would have 16 vCPUs, 112GB of RAM, and support up to 51,200 IOPS or 768MB/sec of sequential throughput (according to Microsoft). A new Standard_DS14-8v2 Azure VM would only have 8 vCPUs, with the same memory capacity and disk performance as the Standard_DS14v2, which would reduce your SQL Server licensing cost per year by 50%. Both of these Azure VM SKUs would have the same ACU score of 160.

Glenn is, as always, a font of useful information. Go read the whole thing.

Comments closed

Unresolved Reference In Database Project

Published 2017-11-07 by Kevin Feasel

Ed Elliott explains something fairly straightforward and gives us a primer on using Stack Overflow too:

If you build an SSDT project you can get an error which says:

“SQL71502: Function: [XXX].[XXX] has an unresolved reference to object [XXX].[XXX].”

If the code that is failing is trying to use something in the “sys” schema or the “INFORMATION_SCHEMA” schema then you need to add a database reference to the master dacpac:

Click through for the answer and a comment-by-comment walkthrough from Ed.

Comments closed

Window Function Sort Performance

Published 2017-11-07 by Kevin Feasel

Lukas Eder explains one potential issue with window functions against large data sets:

Usually, this blog is 100% pro window functions and advocates using them at any occasion. But like any tool, window functions come at a price and we must carefully evaluate if that’s a price we’re willing to pay. That price can be a sort operation. And as we all know, sort operations are expensive. They follow O(n log n) complexity, which must be avoided at all costs for large data sets.

In a previous post, I’ve described how to calculate a running total with window functions (among other ways). In this post, we’re going to calculate the cumulative revenue at each payment in our Sakila database.

This is a good article comparing how different RDBMS products handle a fairly complicated windowed query and what you can do to improve performance.

Comments closed

Multiple Result Sets With ML Services

Published 2017-11-07 by Kevin Feasel

Dave Mason figures out how to create multiple result sets with SQL Server ML Services:

Of course for this strategy to work, I’d have to know ahead of time how many data frames/HTML tables there are. Hmmm. Can dynamic T-SQL help me here? If I could find out at run time how many data frames there are, and which ones I may or may not want, then why not? Here’s some R code that reads HTML tables into a variable as a list of data frames(line 8), iterates through the list (starting at line 18), decides if the HTML table has any data in it (lines 21, 24), and adds the HTML table number (the element number in the list) to a different data frame (line 27). The output shows us we would want HTML tables 1, 2, and 4. (Yeah, I really didn’t want #4. But that can be fixed by enhancing the R code to be more selective. Let’s just go with it for now.)

The method is a bit disappointing (and it’s arguably worse for inputs); I do hope the ML Services team can improve upon this experience.

Comments closed

Curated SQL Posts