Curated SQL – Page 1348 – A Fine Slice Of SQL Server

Soft-NUMA Doesn’t Limit MAXDOP

Published 2017-11-07 by Kevin Feasel

Lonny Niederstadt tests whether soft-NUMA forces MAXDOP = 1:

I mentioned that I was planning to set up a soft-NUMA node for each vcpu on a 16 vcpu VM, to evenly distribute incoming connections and thus DOP 1 queries over vcpus. Thomas Kejser et al used this strategy to good effect in “The Data Loading Performance Guide”, which used SQL Server 2008 as a base.
https://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx

My conversation partner cautioned me that leaving this soft-NUMA configuration in place after the specialized workload would result in DOP 1 queries whether I wanted them or not. The claim was, effectively, a parallel query plan generated by a connection within a soft-NUMA node would have its MAXDOP restricted by the scheduler count (if lower than other MAXDOP contributing factors). Though I wasn’t able to test at the time, I was skeptical: I’d always thought that soft-NUMA was consequential to connection placement, but not to MAXDOP nor to where parallel query workers would be assigned.

I’m back home now… time to test!!

Read on for the test.

Comments closed

Picking Azure VM Sizes

Published 2017-11-07 by Kevin Feasel

Glenn Berry helps us pick the right-sized Azure VM for a SQL Server installation:

A common issue with Azure VM sizing for SQL Server has been the fact that you were often forced to select a VM size that had far more virtual CPU cores than you needed or wanted in order to have enough memory and storage performance to support your workload, which increased your monthly licensing cost.

Luckily, Microsoft has recently made the decision process a little easier for SQL Server with a new series of Azure VMs that use some particular VM sizes (DS, ES, GS, and MS), but reduce the vCPU count to one quarter or one half of the original VM size, while maintaining the same memory, storage and I/O bandwidth. These these new VM sizes have a suffix that specifies the number of active vCPUs to make them easier to identify.

For example, a Standard_DS14v2 Azure VM would have 16 vCPUs, 112GB of RAM, and support up to 51,200 IOPS or 768MB/sec of sequential throughput (according to Microsoft). A new Standard_DS14-8v2 Azure VM would only have 8 vCPUs, with the same memory capacity and disk performance as the Standard_DS14v2, which would reduce your SQL Server licensing cost per year by 50%. Both of these Azure VM SKUs would have the same ACU score of 160.

Glenn is, as always, a font of useful information. Go read the whole thing.

Comments closed

Unresolved Reference In Database Project

Published 2017-11-07 by Kevin Feasel

Ed Elliott explains something fairly straightforward and gives us a primer on using Stack Overflow too:

If you build an SSDT project you can get an error which says:

“SQL71502: Function: [XXX].[XXX] has an unresolved reference to object [XXX].[XXX].”

If the code that is failing is trying to use something in the “sys” schema or the “INFORMATION_SCHEMA” schema then you need to add a database reference to the master dacpac:

Click through for the answer and a comment-by-comment walkthrough from Ed.

Comments closed

Window Function Sort Performance

Published 2017-11-07 by Kevin Feasel

Lukas Eder explains one potential issue with window functions against large data sets:

Usually, this blog is 100% pro window functions and advocates using them at any occasion. But like any tool, window functions come at a price and we must carefully evaluate if that’s a price we’re willing to pay. That price can be a sort operation. And as we all know, sort operations are expensive. They follow O(n log n) complexity, which must be avoided at all costs for large data sets.

In a previous post, I’ve described how to calculate a running total with window functions (among other ways). In this post, we’re going to calculate the cumulative revenue at each payment in our Sakila database.

This is a good article comparing how different RDBMS products handle a fairly complicated windowed query and what you can do to improve performance.

Comments closed

Multiple Result Sets With ML Services

Published 2017-11-07 by Kevin Feasel

Dave Mason figures out how to create multiple result sets with SQL Server ML Services:

Of course for this strategy to work, I’d have to know ahead of time how many data frames/HTML tables there are. Hmmm. Can dynamic T-SQL help me here? If I could find out at run time how many data frames there are, and which ones I may or may not want, then why not? Here’s some R code that reads HTML tables into a variable as a list of data frames(line 8), iterates through the list (starting at line 18), decides if the HTML table has any data in it (lines 21, 24), and adds the HTML table number (the element number in the list) to a different data frame (line 27). The output shows us we would want HTML tables 1, 2, and 4. (Yeah, I really didn’t want #4. But that can be fixed by enhancing the R code to be more selective. Let’s just go with it for now.)

The method is a bit disappointing (and it’s arguably worse for inputs); I do hope the ML Services team can improve upon this experience.

Comments closed

Power BI Report Server Scheduled Refresh

Published 2017-11-07 by Kevin Feasel

Wolfgang Strasser covers Power BI Report Server’s scheduled data refresh ability in a two-part series. First, he shows how to configure scheduled data refresh:

What needs to be added here is the fact that data refresh can be speeded up to 1 minute refresh intervals. (Which is much more often compared to 8 times for powerbi.com PRO users).

When you are done with the schedule configuration, save it and the definition dialog is updated with the plan information. The last data refresh timestamp plus the last status is also displayed.

Just like Reporting Services, data refresh happens via SQL Agent job. Wolfgang then shows us what’s in the Power BI Report Server’s metadata database:

CAUTION: The PBIRS meta-database is a system database needed for successfully running PBIRS. Do NOT modify objects in that database – a non-functioning PBIRS could be the result!

The table dbo.SubscriptionHistory contains the history of data refresh actions. A SELECT on this table provides you with the following information:

Subscription metadata (SubscriptionID plus Type)
Execution runtime information (HistoryID, StartTime, EndTime, Status plus Message)

I’m pretty jazzed about this feature.

Comments closed

stringr Cheat Sheet

Published 2017-11-06 by Kevin Feasel

David Smith points out a cheat sheet for dealing with strings in R:

The RStudio team has created another very useful cheat sheet for R: Working with Strings. This cheat sheet provides an example-laden menu of operations you can perform on strings (character verctors) in R using the stringr package. While base R provides a solid set of string manipulation functions, the stringr package functions are simpler, more consistent (making them easy to use with the pipe operator), and more like the Ruby or Python way of handling string operations.

Click through for a link to the PDF.

Comments closed

Partitioning Tables

Published 2017-11-06 by Kevin Feasel

Eleni Markou shows how to partition tables in Postgres, SQL Server, and Google’s BigQuery:

When it comes to Microsoft SQL Server things are a bit different as this database system does not support dynamic partitions and so partitioning the table manually can be a huge maintenance issue.

That being said, in order to create a partitioned table a similar procedure to the one previously presented must be followed. This time we will create a monthly partition.

Read on for scripts for each.

Comments closed

Cosmos DB Limitations

Published 2017-11-06 by Kevin Feasel

Vincent-Philippe Lauzon points out a few limitations with Cosmos DB:

The original DocumentDB SQL didn’t have any aggregation capacity. But it did acquire those capacities along the way.

Traditionally, that isn’t the strong spot for document-oriented databases. They tend to be more about find documents and manipulating the documents as oppose to aggregating metrics on a mass of documents.

Today, DocumentDB SQL implements the following aggregate functions:

COUNT
SUM
MIN
MAX
AVG

Read on for where the current aggregation limitation is, as well as more.

Comments closed

Is Azure SQL DW A Good Fit For You?

Published 2017-11-06 by Kevin Feasel

Melissa Coates has a nice choose-your-own-adventure story around Azure SQL Data Warehouse:

Q4: How large is your database?

It is difficult to pinpoint an exact number for the absolute minimum size recommended for Azure SQL DW. Many data professionals in the industry see the minimum “practical” data size for Azure SQL DW in the 1-4TB range. Microsoft documentation has recently stated as low as 250GB for a minimum size. Since Azure SQL DW is an MPP (massively parallel processing) system, you experience a significant performance penalty with small data sizes because of the overhead incurred to distribute and consolidate across the nodes (which are distributions in a “shared-nothing” architecture). We recommend Azure SQL DW for a data warehouse which is starting to approach 1TB and expected to continue growing.

Great advice here. I’ve heard too often of people looking at the name “Azure SQL Data Warehouse” and figuring that because they have data warehouses on-prem, this is the appropriate analog. Azure SQL DW is not a typical data warehousing environment; it’s more of a specialized tool than that, so click through to see if it fits your needs.

Comments closed

Curated SQL Posts

Q4: How large is your database?