At the SQL bits keynote today, we announced the release of SQL Server 2019 community technology preview 2.3, the fourth in a monthly cadency of preview releases. Previewed in September 2018, SQL Server 2019 is the first release of SQL Server to closely integrate Apache Spark™ and HDFS with SQL Server in a unified data platform.
There’s not a giant list but there are some interesting items on it. Click through for the full list.
When to use Blob vs ADLS Gen2
New analytics projects should use ADLS Gen2, and current Blob storage should be converted to ADLS Gen2, unless these are non-analytical use cases that only need object storage rather than hierarchical storage (i.e. video, images, backup files), in which case you can use Blob Storage and save a bit of money on transaction costs (storage costs will be the same between Blob and ADLS Gen2 but transaction costs will be a bit higher for ADLS Gen2 due to the overhead of namespaces).
Looks like there are still some things missing from Gen2, so don’t automatically jump on an upgrade. Read the documentation first to make sure you aren’t relying on something which isn’t there yet.
Sometimes you don’t have the perfect development environment for SQL Server, sometimes you deploy to things like SQL Azure and want to test locally, for various reasons it is possible that you want to deploy to one version of SQL Server but set the project properties to a different version of SQL Server. If you find yourself in this situation you might need to use the parameter AllowIncompatiblePlatform
which essentially says “I know I said the project is for SQL 2017 but I am deploying to SQL 2014”, anytime you say this you also sign the contract that says “If I am deploying to a lower version then I have made sure I haven’t used any keywords or object types that didn’t exist in that version and also if everything fails then so be it, I live with my choices every day”.
The story is a little complicated, but Ed straightens it out for us.
Wait! Before you install that next SQL Server, hold up. Are you sure you’re using the right version?
I know, management wants you to stay on an older build, and the vendor says they’ll only support older versions, but now’s your chance to make your case for a newer version – and I’m gonna help you do it.
I’m going to go from the dark ages forward, making a sales pitch for each newer version.
My branch logic is easier: if you need the data today, SQL Server 2017. If you need the data later this year, SQL Server 2019. If you hate your company and yourself, SQL Server 6.5.
Andy Levy has a problem. Well, about 8000 of them. In part 1, he describes the plan:
How do you move eight thousand databases in a reasonable amount of time? I spent about an hour and a half one morning hashing ideas out w/ folks in the dbatools Slack channel, plus several conversations in the office and with our hosting provider.
We missed the estimated time for our go/no-go decision by five minutes. With the number of moving parts, databases in play, unexpected delays, and amount of testing we had to do, that’s pretty good! My colleague and I had some additional work we needed to take care of after the team declared the migration a success. Agent jobs needed to be enabled, overnight job startups monitored, things like that. We called it a day after about 14 hours in the office.
It was a nice success story, so check it out.
When you look at articles, posts and documents about new features and enhancements in SQL Server 2019 CTP2, you will find nothing about Query Store. However, there are some graphical enhancements in SQL Server Management Studio in the version 18.0, also default configuration for Query Store attributes is changed too.
First SSMS 18.0. From this version, you can see another Query Store report – Query Wait Statistics. When you click on it, you can see aggregate waits per category in a given time interval (default is last hour).
It looks like there have been some incremental improvements to Query Store. I think the defaults also make a bit more sense.
R 3.5.2, the latest version of the R language for statistical computation and graphics from the R Foundation, was released today. (This release is codenamed “Eggshell Igloo”, likely in reference to this or this Peanuts cartoon.)
Click through for more detail. I’m probably just going to wait for R 4.0, myself.
The next major enhancement was the addition of a lot of new built-in functions, including higher-order functions, to deal with complex data types easier.
Spark 2.4 introduced 24 new built-in functions, such as
array_max/min, etc., and 5 higher-order functions, such as
The entire list can be found here.
Earlier, for manipulating the complex types (e.g. array type) directly, there are two typical solutions:
1) exploding the nested structure into individual rows, and applying some functions, and then creating the structure again.
2) building a User Defined Function (UDF).
In contrast, the new built-in functions can directly manipulate complex types, and the higher-order functions can manipulate complex values with an anonymous lambda function similar to UDFs but with much better performance.
2.4 was a big release, so check this out for a great summary of the improvements it brings.
In this post, I share my approach and code snippets for:
Getting SQL Server 2019
Running SQL Server 2019 in a Docker Container
Restoring Demo Databases (AdventureWorks and WideWorldImporters)
If your hardware supports Docker, this is a great way of getting some experience with a new version of SQL Server without the mess of cleaning up after a CTP or affecting your current dev environment.
The question is now how to perform a coarse estimation of the equijoin cardinality of the highlighted steps, using the information available.
The original cardinality estimator would have performed a fine-grained step-by-step histogram alignment using linear interpolation, assessed the join contribution of each step (much as we did for the minimum step value before), and summed each step contribution to acquire a full join estimate. While this procedure makes a lot of intuitive sense, practical experience was that this fine-grained approach added computational overhead and could produce results of variable quality.
The original estimator had another way to estimate join cardinality when histogram information was either not available, or heuristically assessed to be inferior. This is known as a frequency-based estimation[…]
It’s an interesting dive into one of the changes in 2014’s CE. The post is a little math-heavy but Paul does a great job keeping it interesting.