Press "Enter" to skip to content

Curated SQL Posts

Backup Types and Recovery Models in SQL Server

I have a new video:

In this video, I cover the three recovery models available to SQL Server, describe (most of) the types of backups you can take, and spend a bit of time covering Recovery Point Objective & Recovery Time Objective.

The advice for video length that people tend to give is 8-10 minutes. I typically average closer to 15 minutes per video. This one is nearly 30 minutes long because there’s just so much information to cover, even with me repeatedly saying “Don’t worry, I’ll cover this bit in a future video.”

Comments closed

Writing Data to an Unattached Lakehouse via Fabric Notebook

Prathy Kamasani does a bit of movement:

Regardless of which architecture we follow, during stages of data integration and transformation there’s always a step to move data from one location to another. And, we work with multiple tables, schemas, and even lake houses.Same goes with Fabric Notebooks. I often find myself in scenarios where I don’t want to attach Lakehouse to my notebook, but I do want to read or write data from various bakehouses.

I recently blogged about a way to achieve this as part of documenting your workspaces. In that post, I described how to write data to a workspace that was not attached to the notebook. I used MsSparkUtil(renamed to NotebookUtils) to mount and then write data in the Lakehouse as Delta tables.

Read on for the answer.

Comments closed

Converting Audit Files to CSV via Read-SqlXEvent

Patrick Keisler finds a work-around:

Reading the contents of an audit event file can be accomplished using either Management Studio or the T-SQL function, sys.fn_get_audit_file. However, sometimes a customer may want to use a third-party tool to read and aggregate audit records, and some of those tools do not have the ability to read the binary audit file. In that case, the customer will just use sys.fn_get_audit_file to write the audit records to a database and then use the third-party tool to ingest those records from the database.

What if that third-party tool cannot even read from SQL Server? That happened to me recently where the only option was to read from a text-based file.

Read on for Patrick’s solution to the problem and a real pain point you’ll find along the way.

Comments closed

Thoughts on Combining UPDATE Operations

Brent Ozar thinks about consolidation:

I’ve got a BEGIN TRAN in there before the updates just so I can test the same queries repeatedly, and roll them back each time. The execution plan for the updates is quite nice: SQL Server divebombs into the supporting indexes:

Relatively few rows match, so our query does less than 1,000 logical reads – way less than there are pages in the table. In this case, separate UPDATE statements make sense.

Brent then continues with, but what do we do when we need to perform multiple independent scans of the same table? Read on for Brent’s answer, but definitely check out Thomas Franz’s comments, which indicate a potential complicating factor.

Comments closed

Ingesting Blob Storage Data into SQL Server

Andy Brownsword brings in some data:

We may associate consuming data from Azure Storage with tools like Data Factory or even SSIS as we saw recently. We don’t always need the middle man though.

Here we’ll demonstrate how to use an External Data Source to perform the ingestion directly into SQL Server.

Click through for the solution. As a quick note, the TYPE attribute that Andy uses in CREATE EXTERNAL DATA SOURCE was necessary from SQL Server 2016 through SQL Server 2019, but no longer exists for SQL Server 2022. Instead, for SQL Server 2022, you’d switch the LOCATION to start with abs:// for Azure Blob Storage and PolyBase would infer the type from the protocol.

Comments closed

Functions in JOIN and Performance Problems

Steve Stedman lays out a warning:

When writing SQL queries, it’s easy to focus on getting the right results without thinking too much about performance. One common mistake that can lead to significant slowdowns is using functions in the JOIN ON clause of a SELECT statement.

While SQL Server supports a wide range of built-in functions, using them incorrectly—especially in the join conditions—can severely impact performance. Let’s break down why this happens, how it can affect your query speed, and what you can do to avoid it.

Read on for the answer.

Comments closed

Restarting Failed Control Flows in Azure Data Factory

Meagan Longoria doesn’t want to repeat good work:

I presented at SQL Saturday Pittshburgh this past weekend about populating your data warehouse with a metadata-driven, pattern-based approach. One of the benefits I mentioned is that it’s easy to employ this pattern for restartability.

For instance, let’s say I am loading data from 30 tables and 5 files into the staging area of my data mart or data warehouse, and one of table loads fails. I don’t want to reload the other tables I just loaded. I want to load the ones that have not been recently loaded. Or let’s say I have 5 dimensions and 4 facts, and I had a failure loading a fact table. I don’t want to reload my dimensions, and I only want to reload the failed facts. How do we accomplish this?

Read on to learn how.

Comments closed

Configuration Is Code

Steve Jones has a public service announcement:

I posted a note on Twitter/X with this quote: “The content updates had not previously been treated as code because they were strictly configuration information.” This is from testimony given by Crowdstrike to a US Congressional committee in trying to explain how they grounded much of the airline industry a few months ago. That was a mess of a situation, and apparently, the vendor didn’t think their configuration was part of their code.

That’s an amazing viewpoint to me. The fact that any developer or manager thinks that their configuration data isn’t a part of their code is worth testing. Yet, I see this attitude all the time, where developers, QA, managers, and more think that the code is the only thing that changes or doesn’t change, ignoring the fact that there are configuration items that affect the code and need to be managed appropriately. Certainly, if the config data were in enums rather than in a file or database they’d feel differently.

Read on for Steve’s extended thoughts. I can understand the urge to call something “just a configuration file” so that you don’t have to do as much work. But that can lead to disaster.

Comments closed