Wait! Before you install that next SQL Server, hold up. Are you sure you’re using the right version?
I know, management wants you to stay on an older build, and the vendor says they’ll only support older versions, but now’s your chance to make your case for a newer version – and I’m gonna help you do it.
I’m going to go from the dark ages forward, making a sales pitch for each newer version.
My branch logic is easier: if you need the data today, SQL Server 2017. If you need the data later this year, SQL Server 2019. If you hate your company and yourself, SQL Server 6.5.
Andy Levy has a problem. Well, about 8000 of them. In part 1, he describes the plan:
How do you move eight thousand databases in a reasonable amount of time? I spent about an hour and a half one morning hashing ideas out w/ folks in the dbatools Slack channel, plus several conversations in the office and with our hosting provider.
We missed the estimated time for our go/no-go decision by five minutes. With the number of moving parts, databases in play, unexpected delays, and amount of testing we had to do, that’s pretty good! My colleague and I had some additional work we needed to take care of after the team declared the migration a success. Agent jobs needed to be enabled, overnight job startups monitored, things like that. We called it a day after about 14 hours in the office.
It was a nice success story, so check it out.
When you look at articles, posts and documents about new features and enhancements in SQL Server 2019 CTP2, you will find nothing about Query Store. However, there are some graphical enhancements in SQL Server Management Studio in the version 18.0, also default configuration for Query Store attributes is changed too.
First SSMS 18.0. From this version, you can see another Query Store report – Query Wait Statistics. When you click on it, you can see aggregate waits per category in a given time interval (default is last hour).
It looks like there have been some incremental improvements to Query Store. I think the defaults also make a bit more sense.
R 3.5.2, the latest version of the R language for statistical computation and graphics from the R Foundation, was released today. (This release is codenamed “Eggshell Igloo”, likely in reference to this or this Peanuts cartoon.)
Click through for more detail. I’m probably just going to wait for R 4.0, myself.
The next major enhancement was the addition of a lot of new built-in functions, including higher-order functions, to deal with complex data types easier.
Spark 2.4 introduced 24 new built-in functions, such as
array_max/min, etc., and 5 higher-order functions, such as
The entire list can be found here.
Earlier, for manipulating the complex types (e.g. array type) directly, there are two typical solutions:
1) exploding the nested structure into individual rows, and applying some functions, and then creating the structure again.
2) building a User Defined Function (UDF).
In contrast, the new built-in functions can directly manipulate complex types, and the higher-order functions can manipulate complex values with an anonymous lambda function similar to UDFs but with much better performance.
2.4 was a big release, so check this out for a great summary of the improvements it brings.
In this post, I share my approach and code snippets for:
Getting SQL Server 2019
Running SQL Server 2019 in a Docker Container
Restoring Demo Databases (AdventureWorks and WideWorldImporters)
If your hardware supports Docker, this is a great way of getting some experience with a new version of SQL Server without the mess of cleaning up after a CTP or affecting your current dev environment.
The question is now how to perform a coarse estimation of the equijoin cardinality of the highlighted steps, using the information available.
The original cardinality estimator would have performed a fine-grained step-by-step histogram alignment using linear interpolation, assessed the join contribution of each step (much as we did for the minimum step value before), and summed each step contribution to acquire a full join estimate. While this procedure makes a lot of intuitive sense, practical experience was that this fine-grained approach added computational overhead and could produce results of variable quality.
The original estimator had another way to estimate join cardinality when histogram information was either not available, or heuristically assessed to be inferior. This is known as a frequency-based estimation[…]
It’s an interesting dive into one of the changes in 2014’s CE. The post is a little math-heavy but Paul does a great job keeping it interesting.
Sometime in the next month, I’ll also be updating Start-DbaMigration to more closely match the parameters of Export-DbaInstance. Parameters like
NoLoginswill be replaced by
-Exclude Databases, Logins.
So the functionality won’t necessarily change, but if you have scheduled tasks or scripts that perform migrations, you will need to update your parameters once you update dbatools once these changes are made.
Keep an eye out for all of these changes if you’re a regular dbatools user or have processes scripted.
SQL Server 2019 big data clusters provide a complete AI platform. Data can be easily ingested via Spark Streaming or traditional SQL inserts and stored in HDFS, relational tables, graph, or JSON/XML. Data can be prepared by using either Spark jobs or Transact-SQL (T-SQL) queries and fed into machine learning model training routines in either Spark or the SQL Server master instance using a variety of programming languages, including Java, Python, R, and Scala. The resulting models can then be operationalized in batch scoring jobs in Spark, in T-SQL stored procedures for real-time scoring, or encapsulated in REST API containers hosted in the big data cluster.
SQL Server big data clusters provide all the tools and systems to ingest, store, and prepare data for analysis as well as to train the machine learning models, store the models, and operationalize them.
Data can be ingested using Spark Streaming, by inserting data directly to HDFS through the HDFS API, or by inserting data into SQL Server through standard T-SQL insert queries. The data can be stored in files in HDFS, or partitioned and stored in data pools, or stored in the SQL Server master instance in tables, graph, or JSON/XML. Either T-SQL or Spark can be used to prepare data by running batch jobs to transform the data, aggregate it, or perform other data wrangling tasks.
Data scientists can choose either to use SQL Server Machine Learning Services in the master instance to run R, Python, or Java model training scripts or to use Spark. In either case, the full library of open-source machine learning libraries, such as TensorFlow or Caffe, can be used to train models.
Lastly, once the models are trained, they can be operationalized in the SQL Server master instance using real-time, native scoring via the PREDICT function in a stored procedure in the SQL Server master instance; or you can use batch scoring over the data in HDFS with Spark. Alternatively, using tools provided with the big data cluster, data engineers can easily wrap the model in a REST API and provision the API + model as a container on the big data cluster as a scoring microservice for easy integration into any application.
I’ve wanted Spark integration ever since 2016 and we’re going to get it.
One of the most popular posts on my blog last year was where I pretty much suggested that people not use table variables:
This wasn’t new information when I wrote it, but bad performance due to the use of table variables remained such a common anti-pattern that I thought it was worth stressing again.
So, when I saw the above 2019 feature I thought I’d better investigate and update what I’m telling people.
TL;DR It looks like table variables are no longer a problem.
Read the whole thing. This has the potential of changing long-standing advice going back a decade regarding table variables.