Press "Enter" to skip to content

Curated SQL Posts

Backup Encryption Performance

Matthew McGiffen runs some tests:

Unlike TDE, there is some extra CPU overhead when you take an encrypted backup as the data has to be encrypted before being written to disk – whereas with TDE the data is already encrypted. Backup times however are unlikely to be affected significantly as the bottleneck is usually going to be the time it takes to physically write the data to disk. The CPU processing should take a fraction of that time.

Matthew’s tests are on a ~9GB database, which is large enough to estimate differences without having us wait all day to compare.

Comments closed

SQL Agent and Memory Consumption

Sean Gallardy performs some troubleshooting:

I was asked if I knew any reason why SQL Agent would be using “a bunch” of memory and more cpu than normal. You and I, reader, now have the same information to go on. What do you do? Think about for a minute or two if you want before reading on. I’m not saying the way I did it was the way to do it or the only way, but I gave an action plan for data capture and once data was in hand, was solved in a few minutes. There are all sorts of things that can cause this in a program, but Agent typically doesn’t use a whole lot, unless it’s executing many concurrent T-SQL jobs (which, it really shouldn’t be, get enterprise level scheduling) and even then, the memory should deallocate.

My first inclination turned out to be right: I figured it had to do with a job running. The specifics, that was something I wasn’t sure about, but Sean takes us through the troubleshooting process.

Comments closed

Could Not Update the Metadata that Indicates Database Enabled for CDC

Jose Manuel Jurado Diaz troubleshoots an issue in Azure SQL DB:

Today, we got a error message while trying to enable cdc for a database using the sqladmin user. Our customer got the error message: Msg 22830, Level 16, State 1, Procedure sys.sp_cdc_enable_db_internal, Line 283 [Batch Start Line 0]
Could not update the metadata that indicates database XYZ is enabled for Change Data Capture. The failure occurred when executing the command ‘SetCDCTracked(Value = 1)’. The error returned was 33171: ‘Only active directory users can impersonate other active directory users.’. Use the action and error to determine the cause of the failure and resubmit the request.

Read on to understand what the problem is and how you can resolve it.

Comments closed

Analyzing Big-O Notation in Polyglot Notebooks

Matt Eland brings me back to college:

Polyglot Notebooks is a great way of running interactive code experiments mixed together with rich markdown documentation.

In this short article I want to introduce you to the #!time magic command and show you how you can easily measure the execution time of a block of code.

This can be helpful for understanding the rough performance characteristics of a block of code inside of your Polyglot Notebook.

In fact, we’ll use this to explore the programming concepts behind Big O notation and how code performance changes based on the number of items.

I like this for two reasons. First, because a visual indicator of Big-O notation is helpful for students learning about the topic. Second, because that’s not the only thing you can do with the #time magic.

Comments closed

Managing Database Privileges in Postgres

Ryan Booz limits database access:

We discussed how this can be done manually with a GRANT command each time an object is created, however, that is time consuming to manage and easy to miss a detail.

Instead, PostgreSQL provides a method for setting default privileges which are granted on behalf of the object owner as database objects are created. Using default privileges, a role can prepare the database ahead of time to ensure that consistent access privileges are applied while easing the management burden over time.

But how do you go about creating a set of roles and default privileges that will provide the right level of control and access? Let’s dig a little deeper.

RBAC on groups is definitely the way to go, and Ryan shows us how.

Comments closed

ALL and ALLSelected in DAX

Reza Rad compares and contrasts:

These two functions in DAX are often used instead of each other; ALL and ALLSELECTED. Their behavior can be similar in some contexts, but it can also be different in other contexts. In this article and video, I’ll explain the difference between these two functions and when to use each in DAX for Power BI, Analysis Services, or Power Pivot.

Reza has a video as well as a blog post to describe the differences.

Comments closed

The Myth of the DBA-Free Cloud

Matthew McGiffen lays out an explanation:

I was chatting with a cloud consultant who was advising on a large scale migration to AWS. He told me that one of the advantages of going for a PaaS offering (Platform as a Service) was that DBAs were no longer required as backups and restores were handled for you. PaaS services for SQL Server include AWS RDS and Azure SQL Database or Azure SQL Managed Instance.

I found it quite a funny conversation, partly as I don’t think he realised being a DBA was part of my job role, but also because I don’t know a single DBA who spends a significant amount of their time doing backups and restores.

I still remember (through others—I wasn’t in this space yet) the advertising campaign that SQL Server 2005 would completely eliminate the need for a DBA because everything would just work on its own, even sweet database tuning using the Database Tuning Advisor. The same thing applies today: even those DBA-free databases eventually need somebody to optimize them along various dimensions, ensure they are running smoothly, and correct issues if they are not. Perhaps we could call this role the Administrator of a Database or AoD, so as not to scare the DBA-free database vendors. “No, we don’t have DBAs—we just need you to have a few AoDs on staff.”

Comments closed

Query Snowflake Data from Spark

The Big Data in Real World team crosses data platforms:

If your organization is working with lots of data you might be leveraging Spark to compute distribution. You could also potentially have some or all your data in a Snowflake data warehouse.

In a situation like this, you might have to expose data in Snowflake to the processes that run on Spark. This is made possible using the Spark Connector for Snowflake.

In this post, we will see what is Spark connector for Snowflake and how to use it from Spark to connect to Snowflake and access data from Snowflake in your Spark cluster.

Read on for a high-level architecture of how it works and the configuration you’ll need to do to get it running.

Comments closed

Common Date and Time Operations in R

Steven Sanderson works with dates:

Dates and times are essential components in many programming tasks, and R provides various functions and packages to handle them effectively. In this post, we’ll explore some common operations using both the base R functions and the lubridate package, comparing their simplicity and ease of understanding.

I personally prefer the lubridate style of date operation, but it’s nice to have options.

Comments closed