Press "Enter" to skip to content

Author: Kevin Feasel

Changing Server Collations En Masse

Hugo Kornelis has a script to change server collations across a large number of objects:

Problem is: the new instance was set up with a default collation that was different from the default collation of the old instance. And hence different from the database default collation. And when that resulted in errors, they responded by changing the default collation. Now everything ran and they were happy.

Fast forward two years. The database now has lots of “old” tables that use the original default collation for all string columns. But also lots of “new” tables that have their string columns using the new default collation. And the upgrade failed because somewhere in the new code is a comparison between two character columns that now have incompatible collation.

Click through for the script, as well as the standard disclaimer never blindly to run things in production.

Comments closed

Rotating Expired TDE Certificates

Chris Bell shows how you can quickly rotate TDE certificates, hopefully before they expire:

We have expired or expiring SQL TDE certificates! What now?
Well, the first thing we do is not panic. Even if our TDE certificate expires it won’t cause any issues. The SQL Server will continue to work normally. Even if we restore the DB elsewhere using the expired certificate we will just get a warning that the certificate is expired.
A warning is nice, and the system still working let’s us breathe a little easier, but we know that an updated certificate is a much better thing to have. In fact, setting up a regular key rotation schedule is even better and a recommended practice.

Good information, and Chris shares scripts to make it easy.

Comments closed

Impossible Joins And Terrible Execution Plans

Erik Darling shows us a case where the same query can be nice and fast, but change one parameter and suddenly performance goes out the window:

In the original plan, the TOP asked for rows, and quickly got them.
In the second plan, the TOP kept asking for rows, getting them from the Votes table, and then losing them on the join to Posts.
There was no parameter sniffing, there were no out of date stats, no blocking, or any other oddities. It’s just plain bad luck because of the data’s relationship.

Read the whole thing.

Comments closed

Computer Internals and the Buffer Pool

Randolph West starts a new series on the buffer pool in SQL Server:

Now that we’ve reminded ourselves of those fundamentals, let’s take a closer look at the buffer pool.

The buffer pool in SQL Server resides in the computer’s main memory (RAM). When the database engine requests a data page for reading or writing, it is assumed to be in the buffer pool. The buffer pool itself controls access between RAM and storage. If the data page that the database engine requests is not in RAM, a request is sent to the storage engine to retrieve that page. This may be storage directly attached to the system, or via a network interface card.

This first post is a high-level overview, but it looks like there’s a lot more in store from Randolph.

Comments closed

AMD vs Intel CPUs For Data Processing Jobs

Hariharan Iyer and Abhishek Srivastava run some tests against AWS’s new AMD-powered EC2 instances:

Our summary findings from TPCDS benchmarks are as follows:
– TPCDS queries are not as sensitive to local disk performance (and hence to EBS volume sizes)
– r5 (Intel) instances are consistently faster than r5a (AMD) instances. However, the speedup depends on the engine and the speedup for r5 (Intel) is lower for Spark (10%) than for Hive (25%).
– r5 instances are also either cheaper (by about 10% for Hive) or the same cost (for Spark)

At least for Hadoop and Spark work, Intel CPUs are a bit better, but there is some nuance in the story so check it out.

Comments closed

Running RStudio Server In Azure

David Smith notes that RStudio Server Pro is now available on Azure:

RStudio Server Pro is now available on the Azure Marketplace, the company announced on the RStudio Blog earlier this month. This means you can launch RStudio Server Pro on an virtual machine with the memory, disk, and CPU configuration of your choice, and pay by the minute for the VM instance plus a the RStudio software charge. Then, you can use a browser to access the remote RStudio Server (the interface is nigh-indistinguishable from the desktop version), with access to the commercial features of RStudio including support for multiple R version and concurrent R sessions, load-balancing and high availability instances, and enhanced security.

RStudio Server Pro and Microsoft R Server are both very nice for production-quality R servers. You can get away with the open source versions, but there are some good reasons to use the enterprise-grade products in an enterprise.

Comments closed

Calculating Skew In SQL

Lukas Eder shows how you can use PERCENTILE_DISC to calculate skewness in SQL:

In RDBMS, we sometimes use the term skew colloquially to mean the same thing as non-uniform distribution, i.e. a normal distribution would also be skewed. We simply mean that some values appear more often than others. Thus, I will put the term “skew” in double quotes in this article. While your RDBMS’s statistics contain this information once they are calculated, we can also detect such “skew” manually in ad-hoc queries using percentiles, which are defined in the SQL standard and supported in a variety of databases, as ordinary aggregate functions, including:
– Oracle
– PostgreSQL
– SQL Server (regrettably, only as window functions)

As Lukas implies, SQL Server is a step behind in terms of calculating percentiles, and calculating several percentiles over a large data set will be slow. Very slow. Though batch mode processing in 2019 does help here.

Comments closed

Optimizing For Ad Hoc Workloads

Bert Wagner looks at the optimize for ad hoc workloads option in SQL Server:

Instead of filling the execution plan cache with plans that will never get reused, the optimize for ad hoc workloads option will cache a plan stub instead of the full plan. The plan stub is significantly smaller in size and is only replaced with the full execution plan when SQL Server recognizes that the same query has executed multiple times.

This reduces the amount of size one-time queries take up in t he cache, allowing more reusable plans to remain in the cache for longer periods of time.

I’ve run into several cases where this has helped SQL Server and don’t think I’ve found a scenario where it actively hurts.

Comments closed

Adding ML Services On Windows Server Core

Kevin Chant shows us how to add SQL Server ML Services to an already-existing SQL Server installation on Windows Server Core:

It’s important to try and use an install set that is the same level of Service pack as your current install. Otherwise, you could end up installing multiple patches to get the SQL Launchpad service to work. Which is something discussed in a previous post here.

I know some companies have a central installer for SQL Server and then have all the updates in another location. Hence, if you are in such an environment be prepared to run multiple updates from that location after the install.

This is definitely one of the features which is easier to install from the beginning than to install after the fact.

Comments closed

Early Thoughts On Scalar UDF Inlining

Aaron Bertrand shares some early thoughts on a SQL Server 2019 improvement:

Since, in spite of our best efforts since SQL Server 2000, we can’t effectively stop scalar UDFs from being used, wouldn’t it be great to make SQL Server simply handle them better?

SQL Server 2019 introduces a new feature called Scalar UDF Inlining. Instead of keeping the function separate, it is incorporated into the overall plan. This leads to a much better execution plan and, in turn, better runtime performance.

Read the whole thing. Especially the part about how this changes the way one of the DMV counters behaves.

Comments closed