Press "Enter" to skip to content

Curated SQL Posts

Azure SQL Database Compatibility Level Change

Joe Sack reports that new Azure SQL Databases will have a compatibility level of 140 by default:

Once this new database compatibility default goes into effect, if you still wish to use database compatibility level 130 (or lower), please follow the instructions detailed here: View or Change the Compatibility Level of a Database.  For example, you may wish to ensure that new databases created in Azure SQL Database use the same compatibility level as other databases in Azure SQL Database to ensure consistent query optimization behavior across development, QA and production versions of your databases. We recommend that database configuration scripts explicitly designate COMPATIBILITY_LEVEL rather than rely on the defaults, in order to ensure consistent application behavior.

For new databases supporting new applications, we recommend using the latest compatibility level (140).  For pre-existing databases running at lower compatibility levels, the recommended workflow for upgrading the query processor to a higher compatibility level is detailed in the article, Change the Database Compatibility Mode and Use the Query Store.  Note that this article refers to compatibility level 130 and SQL Server, but the same methodology applies for moves to 140 for SQL Server and Azure SQL DB.

It’s good to hear, and as Joe mentions, you have the ability to move back down to 130 if you need it.

Comments closed

Selecting Into Tables, Sans Identity

Kenneth Fisher shows a couple of ways to remove an identity property from a column when creating a new table:

A while back I did a post about creating an empty table using a SELECT statement. Basically doing something like this:

1
SELECT TOP 0 * INTO tableNameArchive FROM tableName

will create a new table with the exact same structure as the source table. It can be a really handy way to create an archive table, a temp table, etc. You don’t create any of the extra objects (indexes, triggers, constraints etc) but what you do end up with is every table property from the original table. This includes datatypes, nullability, and (as I’m sure you realized from the title) IDENTITY. Which if you are creating an archive table, a temp table, etc is probably not something you want. Fortunately, there are two easy ways to get rid of the identity.

Click through to see those two methods.

Comments closed

Early Thoughts On New AMD CPUs

Glenn Berry talks about the new AMD processor lines and how they might work with SQL Server:

AMD is really pushing the idea of a single-socket EPYC system as a better alternative to a two-socket Intel system for many server workloads. According to AMD, it will be much less expensive, yet will have plenty of cores, memory, and PCIe 3.0 lanes, along with no NUMA overhead. One key advantage AMD is touting is their Infinity Fabric modular interconnect technology, that works both within a single processor and between multiple processors.

For SQL Server 2016/2017 usage, you would still want the “top of the line” SKU for a given physical core count, to get the most performance for each physical core license that you buy. Unlike Intel, AMD does not increase the base clock speed in the lower core count models. These EPYC systems have a lot of PCIe 3.0 lanes and very high memory density, so they might work really well for large SQL Server DW/Reporting workloads. For OLTP workloads, the key will be how much single-threaded performance AMD is able to get from this first-generation of EPYC, and how they compare to Intel’s new Skylake-SP processors. Figure 3 shows the fastest EPYC processor at each core count, which is what you would want for SQL Server usage.

There aren’t too many hard numbers yet, but the worst case scenario is that they force Intel to improve their offerings.

Comments closed

Learning Spark Structured Streaming

Jules Damji has a nice compendium of links and additional resources for people wanting to learn more about Apache Spark’s Structured Streaming:

Structured Streaming In Apache Spark: A new high-level API for streaming

Databricks’ engineers and Apache Spark committers Matei Zaharia, Tathagata Das, Michael Armbrust and Reynold Xin expound on why streaming applications are difficult to write, and how Structured Streaming addresses all the underlying complexities.

There’s quite a bit of reading material on the other side.

Comments closed

Simpson’s Paradox Explained

Mehdi Daoudi, et al, have a nice explanation of Simpson’s Paradox:

E.H. Simpson first described the phenomenon of Simpson’s paradox in 1951. The actual name “Simpson’s paradox” was introduced by Colin R. Blyth in 1972. Blyth mentioned that:

G.W. Haggstrom pointed out that Simpson’s paradox is the simplest form of the false correlation paradox in which the domain of x is divided into short intervals, on each of which y is a linear function of x with large negative slope, but these short line segments get progressively higher to the right, so that over the whole domain of x, the variable y is practically a linear function of x with large positive slope.

The authors also provide a helpful example with operational metrics, showing how aggregating the data leads to an opposite (and invalid) conclusion.

Comments closed

Gradient Boosting In R

Anish Sing Walia walks us through a gradient boosting exercise using R:

An important thing to remember in boosting is that the base learner which is being boosted should not be a complex and complicated learner which has high variance for e.g a neural network with lots of nodes and high weight values.For such learners boosting will have inverse effects.

So I will explain Boosting with respect to decision trees in this tutorial because they can be regarded as weak learners most of the times.We will generate a gradient boosting model.

Click through for more details.  H/T R-Bloggers

Comments closed

Columnstore Predicate Pushdown Tipping Point

Lonny Niederstadt is hot on the trail, looking for evidence of a tipping point for COUNT(*) aggregates performing pushdown against a clustered columnstore index:

Below is what we want post-execution plans to look like when counting rows in a range – the thin arrow coming out of the Columnstore scan is a hint that predicate pushdown was successful.  I didn’t specify MAXDOP in a query hint, and Resource Governor isn’t supplying MAXDOP; MAXDOP 8 is coming from the Database Scoped Configuration.  The degree of parallelism turns out to be a significant factor in determining the tipping point.  The [key] column is a BigInt.  Maybe its surprising that I’m using 27213.9 as the upper bound.  But… check out the estimated number of rows 🙂 Again – this estimate is coming from the Legacy CE, specified in the database scoped configuration for my database.

Interesting findings, although it looks like the specific values are going to be more settings-dependent than the usual finding of this nature.

Comments closed

On Global Temp Tables

Denis Gobo riffs on global temp tables:

SQL Azure has added something called database scoped global temporary tables.

Azure SQL Database supports global temporary tables that are also stored in tempdb and scoped to the database level. This means that global temporary tables are shared for all users’ sessions within the same Azure SQL database. User sessions from other Azure SQL databases cannot access global temporary tables.

They way you add these is by using a double pound sign

Click through for more.  I will occasionally use a global temp table, mostly in conjunction with Central Management Server queries going over a set of databases on each instance, but I limit my usage to manual operations and nothing as part of a normal application or administrative process.

Comments closed

Columnstore Segment Alignment

Niko Neugebauer walks us through a new segment alignment detection function he has written:

There are 3 important factors that I use for the determination of the column that is really good for the Segment Elimination and hence the Segment Alignment:
– The support for the Segment Elimination (and then for the Predicate Pushdown). If the data type does not support Segment Elimination, than why would someone optimise for it ?
– The frequency with which the column is used in the predicates (not in the joins, because this is where generally the Segment Elimination/Predicate Pushdown does not function)
– The number of the distinct values within a table/partition (if we have more Segments than distinct values, it is not a very good sign generally: example – 10 million rows with 5 distinct values)

Read on for more details.

Comments closed

Waits With Outsized Importance

Brent Ozar has a few wait types whose appearance in your “uh-oh” list is disproportionate to the wait type’s relative percentage:

RESOURCE_SEMAPHORE_QUERY_COMPILE – this means a query came in, and SQL Server didn’t have an execution plan cached for it. In order to build an execution plan, SQL Server needs a little memory – not a lot, just a little – but that memory wasn’t available. SQL Server had to wait for memory to become available before it could even build an execution plan. For more details and a reproduction script, check out my Bad Idea Jeans: Dynamically Generating Ugly Queries post. In this scenario, cached query plans (and small ones) may be able to proceed just fine (depending on how much pressure the server is under), but the ugly ones will feel frozen.

Read on for a couple more examples of poison wait types.

Comments closed