Press "Enter" to skip to content

Month: June 2024

Don’t Enable TRUSTWORTHY on SQL Server

Jeff Iannucci shares good advice:

If you have ever used our free tool to check SQL Server security, you may have seen the check for the “TRUSTWORTHY database owned by sysadmin” show up as one of the highest of priority items, requiring action. When we started reviewing the security permissions and configurations for our clients’ instances, we didn’t expect to find it very often since TRUSTWORTHY database setting is off by default.

Unfortunately, this has been discovered with some frequency, and when combined with a few other common practices, it presents a tremendous vulnerability to escalate privileges for both authorized users and hackers.

Read on to learn more about this. And to supplement, I will once again link Solomon Rutzky’s outstanding guide on the topic.

Leave a Comment

Approximate Percentiles in SQL Server 2022

Chad Callihan tries out a big improvement:

How do you go about finding the median percentile of a data set? What if you need the top x percentile? Both the APPROX_PERCENTILE_CONT and APPROX_PERCENTILE_DISC functions can be used to solve these questions.

Let’s look at how we can use each and what makes them unique.

The approximate percentiles are guaranteed to be accurate to within a certain percentage, something like 3-5%, if I remember correctly–it’s higher than HyperLogLog’s ~2.5% but not so large as to be of low value. If you’ve ever tried to calculate a median or other percentile like the 75th or 95th percentile, you might have used PERCENTILE_CONT() in the past. At least until you get a few million rows in the table, at which point you stopped using it. My joke is, once you reach a certain table size, PERCENTILE_CONT() becomes so slow that it’s faster to install and configure SQL Server ML Services, learn R or Python, and send in the data to calculate a percentile than to wait for PERCENTILE_CONT() to complete.

The APPROX_PERCENTILE_* series is way, way faster. On reasonable-sized test cases of a couple million rows or so, my recollection is two orders of magnitude better performance, so long as you can deal with being off by a few percentage points. One of the best scenarios for something like this is calculating 95th percentile response times. Does it really matter that the actual response time was 187.5ms and SQL Server said 192.6 or 181.4? Probably not—you get a good idea of the magnitude, and that’s the important part here.

Leave a Comment

An Overview of Spark in Microsoft Fabric

Reza Rad gives people a primer on Apache Spark:

Microsoft Fabric runs some workloads under the Spark engine, but what is it really? In this article, I’ll take you through the question of what Spark is, What benefits it has, how it is associated with Fabric, what configurations you have, and other things you need to know about it.

Reza talks a bit about history, interaction with languages, etc. As a quick addition to the languages list, you can use .NET languages like F# and C# with Spark, though it does involve setting up dotnet/spark and there are some open questions about its future. And I’m not even sure you could get it to work with Microsoft Fabric.

Leave a Comment

Restoring a SQL Server 2000 Backup in 2024

Randolph West turns back the clock:

Problem statement: I have a SQL Server 2000 database backup that I need to restore to a supported version of SQL Server (preferably SQL Server 2022). How do I do that?

Read on for Randolph’s answer on how to do this. Pre-2008, things may get a little shaky, as that’s back when Microsoft really deprecated functionality. So even if you do restore a backup using this technique, I do wonder what happens if you were using a deprecated datatype like NTEXT or TIMESTAMP. As of SQL Server 2022, those are still in the product, so I don’t have a good example so much as a bit of niggling paranoia.

Leave a Comment

Security Tips for Backups in SQL Server

Mike Walsh shares a few tips:

There are obviously many things to check when it comes to SQL Server security. We’re talking about a lot of those elements of security this month. Today I want to talk a little more about backups as it relates to their role in securing your SQL Servers.

When it comes to your backups and security in SQL Server –>

Read on for five tips to help ensure your backup strategy is doing what you intend it to do, but is also not a real attack vector at the same time.

Leave a Comment

Advance Notifications for Azure SQL MI

Uros Milanovic gives us a heads up:

Advance notifications allow you to prepare for planned maintenance events on your SQL Managed Instance resources. They alert you 24 hours before a planned maintenance event. Advance notifications work hand-in-hand with SQL Maintenance Windows – with the two combined, you gain control over when your managed instances receive updates and receive a notification ahead of time.

Read on to learn more about how this works. There is a bit of setup involved to subscribe to these, though Uros provides a link to a guide on how to do it.

Leave a Comment

Postgres Vacuuming and Transaction ID Wraparound

Andreas Scherbaum explains what Transaction ID Wraparound is and how you can avoid it:

PostgreSQL uses Multi-Version Concurrency Control (MVCC) to manage concurrent access to the database, and store each version of a tuple in a table. To keep the database efficient, PostgreSQL employs a cleanup process called VACUUM, which removes unnecessary tuples. This not only saves disk space and keeps the database size under control, it also prevents the dreaded “Transaction ID Wraparound” issue. Regularly running VACUUM is crucial, including on the template0 database, which is normally handled by the Autovacuum daemon. However, sometimes manual intervention is necessary.

Read on to learn more.

Leave a Comment

An Overview of gganimate

Dario Radecic shows off a neat library:

The main criticism people have when it comes to ggplot2 is the static nature of the charts it has to offer. Truth be told, it will never be an interactive visualization king like Highcharts, but it doesn’t mean animation is out of the picture.

Meet R gganimate – a natural extension of ggplot2 that allows you to visualize your data change through time or some other variable, and then render and export the chart as a set of PNGs, or a single GIF/MP4.

Click through to learn more about it. I remembered the original gganimate and was going to say, “Wow, I hadn’t heard of that library in forever.” But it turns out that Thomas Lin Pedersen built a newer version of the library and has added in quite a bit of functionality since the last time I looked. H/T R-Bloggers.

Leave a Comment

New Video: Online Passive-Aggressive Algorithms

I have a new video:

In this video, I cover the series of classification algorithms with the best possible name: online passive-aggressive algorithms.

I remember, when reading up on this, being incredulous that the idea even worked. But it turns out that it’s actually pretty good in practice, especially on constrained hardware. Still, this is definitely an algorithm you’d want to test in comparison to others before jumping right in, as there’s a risk you can end up with terrible results.

Leave a Comment