Press "Enter" to skip to content

Category: Versions

Query Store Changes

Milos Radivojevic shows us the Query Store default values and how they’ve changed between SQL Server 2017 and SQL Server 2019:

When you look at articles, posts and documents about new features and enhancements in SQL Server 2019 CTP2, you will find nothing about Query Store. However, there are some  graphical enhancements in SQL Server Management Studio in the version 18.0, also default configuration for Query Store attributes is changed too.
First SSMS 18.0. From this version, you can see another Query Store report – Query Wait Statistics. When you click on it, you can see aggregate waits per category in a given time interval (default is last hour). 

It looks like there have been some incremental improvements to Query Store. I think the defaults also make a bit more sense.

Comments closed

Summarizing Improvements In Spark 2.4

Anmol Sarna summarizes Apache Spark 2.4 and pushes his meme game at the same time:

The next major enhancement was the addition of a lot of new built-in functions, including higher-order functions, to deal with complex data types easier.
Spark 2.4 introduced 24 new built-in functions, such as  array_unionarray_max/min, etc., and 5 higher-order functions, such as transformfilter, etc.
The entire list can be found here.
Earlier, for manipulating the complex types (e.g. array type) directly, there are two typical solutions:
1) exploding the nested structure into individual rows, and applying some functions, and then creating the structure again.
2) building a User Defined Function (UDF).
In contrast, the new built-in functions can directly manipulate complex types, and the higher-order functions can manipulate complex values with an anonymous lambda function similar to UDFs but with much better performance.

2.4 was a big release, so check this out for a great summary of the improvements it brings.

Comments closed

Running SQL Server 2019 In A Docker Container

Cathrine Wilhelmsen shows us how to set up a Docker container running SQL Server 2019 on Linux:

In this post, I share my approach and code snippets for:

  1. Installing Docker

  2. Getting SQL Server 2019

  3. Running SQL Server 2019 in a Docker Container

  4. Restoring Demo Databases (AdventureWorks and WideWorldImporters)

If your hardware supports Docker, this is a great way of getting some experience with a new version of SQL Server without the mess of cleaning up after a CTP or affecting your current dev environment.

Comments closed

Join Estimation: Details On A Cardinality Estimator Change

Paul White shares some detail on a change to the Cardinality Estimator in SQL Server 2014:

The question is now how to perform a coarse estimation of the equijoin cardinality of the highlighted steps, using the information available.

The original cardinality estimator would have performed a fine-grained step-by-step histogram alignment using linear interpolation, assessed the join contribution of each step (much as we did for the minimum step value before), and summed each step contribution to acquire a full join estimate. While this procedure makes a lot of intuitive sense, practical experience was that this fine-grained approach added computational overhead and could produce results of variable quality.

The original estimator had another way to estimate join cardinality when histogram information was either not available, or heuristically assessed to be inferior. This is known as a frequency-based estimation[…]

It’s an interesting dive into one of the changes in 2014’s CE.  The post is a little math-heavy but Paul does a great job keeping it interesting.

Comments closed

Breaking Changes Coming To dbatools

Chrissy LeMaire warns us about breaking changes coming to dbatools with release 1.0:

Sometime in the next month, I’ll also be updating Start-DbaMigration to more closely match the parameters of Export-DbaInstance. Parameters like NoDatabases and NoLogins will be replaced by -Exclude Databases, Logins.

So the functionality won’t necessarily change, but if you have scheduled tasks or scripts that perform migrations, you will need to update your parameters once you update dbatools once these changes are made.

Keep an eye out for all of these changes if you’re a regular dbatools user or have processes scripted.

Comments closed

Hadoop + SQL Server In 2019

Travis Wright shows off a big part of what the SQL Server team has been working on the last couple of years:

SQL Server 2019 big data clusters provide a complete AI platform. Data can be easily ingested via Spark Streaming or traditional SQL inserts and stored in HDFS, relational tables, graph, or JSON/XML. Data can be prepared by using either Spark jobs or Transact-SQL (T-SQL) queries and fed into machine learning model training routines in either Spark or the SQL Server master instance using a variety of programming languages, including Java, Python, R, and Scala. The resulting models can then be operationalized in batch scoring jobs in Spark, in T-SQL stored procedures for real-time scoring, or encapsulated in REST API containers hosted in the big data cluster.

SQL Server big data clusters provide all the tools and systems to ingest, store, and prepare data for analysis as well as to train the machine learning models, store the models, and operationalize them.
Data can be ingested using Spark Streaming, by inserting data directly to HDFS through the HDFS API, or by inserting data into SQL Server through standard T-SQL insert queries. The data can be stored in files in HDFS, or partitioned and stored in data pools, or stored in the SQL Server master instance in tables, graph, or JSON/XML. Either T-SQL or Spark can be used to prepare data by running batch jobs to transform the data, aggregate it, or perform other data wrangling tasks.

Data scientists can choose either to use SQL Server Machine Learning Services in the master instance to run R, Python, or Java model training scripts or to use Spark. In either case, the full library of open-source machine learning libraries, such as TensorFlow or Caffe, can be used to train models.

Lastly, once the models are trained, they can be operationalized in the SQL Server master instance using real-time, native scoring via the PREDICT function in a stored procedure in the SQL Server master instance; or you can use batch scoring over the data in HDFS with Spark. Alternatively, using tools provided with the big data cluster, data engineers can easily wrap the model in a REST API and provision the API + model as a container on the big data cluster as a scoring microservice for easy integration into any application.

I’ve wanted Spark integration ever since 2016 and we’re going to get it.

Comments closed

Improvements In Table Variable Performance In SQL Server 2019

Matthew McGiffen tries out SQL Server 2019 to test a scenario where table variables were giving poor estimates in prior versions:

One of the most popular posts on my blog last year was where I pretty much suggested that people not use table variables:

Think twice before using table variables

This wasn’t new information when I wrote it, but bad performance due to the use of table variables remained such a common anti-pattern that I thought it was worth stressing again.

So, when I saw the above 2019 feature I thought I’d better investigate and update what I’m telling people.

TL;DR It looks like table variables are no longer a problem.

Read the whole thing.  This has the potential of changing long-standing advice going back a decade regarding table variables.

Comments closed

What’s In SQL Server 2019 CTP 2.0?

Aaron Bertrand gives us the highlights:

  • Certificate Management in Config Manager View and validate all of your certificates from a single interface, and manage and deploy certificate changes across all of the replicas in an Availability Group or all of the nodes in a Failover Cluster Instance.

  • Built-in data classification A new ADD SENSITIVITY CLASSIFICATION statement helps you identify and automatically audit sensitive data, a huge step up from the previous SSMS wizard (which just used extended properties).

Aaron also digs into the engine a bit:

APPROX_COUNT_DISTINCT

This new aggregate function is designed for data warehouse scenarios, and is an equivalent for COUNT(DISTINCT()). Instead of performing expensive distinct sort operations to determine actual counts, it relies instead on statistics to get something relatively accurate. You should find that the margin of error is within 2% of the precise count, 97% of the time, which is usually fine for high-level analytics, values that populate a dashboard, or quick estimates.

On my system I created a table with integer columns ranging from 100 to 1,000,000 unique values, and string columns ranging from 100 to 100,000 unique values. There were no indexes other than a clustered primary key on the leading integer column. Here are the results of COUNT(DISTINCT()) vs. APPROX_COUNT_DISTINCT() against those columns, so you can see where it is off by a bit (but always well within 2%):

By the way, APPROX_COUNT_DISTINCT() is a really good idea, and I’m glad it’s here.

Comments closed

SSMS 17.9 Released

Alan Yu announces a new version of SQL Server Management Studio:

SSMS 17.9 provides support for almost all feature areas on SQL Server 2008 through the latest SQL Server 2017, which is now generally available.

In addition to enhancements and bug fixes, SSMS 17.9 comes with several new features:

  • ShowPlan improvements
  • Azure SQL support for vCore SKUs
  • Bug Fixes

View the Release Notes for more information.

It looks like the big push for this release was bug fixes, and there are quite a few of them.

Comments closed