Press "Enter" to skip to content

Category: Storage

T-SQL Snapshot Backups on Hyper-V

Anthony Nocentino takes a backup:

If you’ve been following my T-SQL Snapshot Backup series, you’ve seen this technique work on bare-metal and standard VM deployments where database files live on volumes directly presented to the SQL Server OS. In this post, I’m bringing T-SQL Snapshot Backup into a Hyper-V cluster environment, with database files on VHDXs backed by a Pure Storage FlashArray Cluster Shared Volume (CSV). Hyper-V adds a few extra layers to manage at the hypervisor level, but the SQL Server side of the story is identical. Let’s walk through it.

Click through to see how it all works. It has taken me a long time to accept the idea that storage volume snapshots could fit the bill, and it’s been people like Anthony, Andy Yun, and Mark Wilkinson who have allayed my concerns over the years.

Leave a Comment

Alternative Storage Engines for PostgreSQL

Cristophe Pettus puts together a list:

PostgreSQL 12 shipped the table access method API in October 2019, and the community spent the next six years figuring out what to do with it. The early prediction was that within a few releases we would have a thriving ecosystem of pluggable storage engines — columnar for analytics, undo-log for OLTP, in-memory for hot workloads — and the heap would become “the default, not the only choice.”

That is almost what happened. The ecosystem exists. It is thriving in the sense that there are more credible options now than at any point in the project’s history. It is also messier than the early vision implied, more concentrated around a couple of design philosophies than was expected, and littered with the bones of projects that ran out of funding or hit walls in the API itself. Before the next post in this series puts numbers on any of this, it is worth taking stock of what is actually out there, what each project is trying to do, and where the architectural fault lines run.

Click through for a survey of who’s tried what and what’s still around today.

Leave a Comment

Moving System Databases in SQL Server

Rich Benner hires some movers:

As consultants, we often see system databases existing on the C drive on SQL Servers. There are some issues with this setup, and the biggest is: if one of your system databases grows and fills your C drive, you will likely crash the OS. If that happens, we’re in big trouble. Therefore, moving system databases becomes a necessary operation at times.

This is such a common issue because the default locations are set to C for these databases and that’s where they end up on fresh installs 99% of the time. Don’t worry! If you’re in this situation you’re not alone.

Click through for a query that shows which databases are on which drive and how to migrate databases post-install.

Leave a Comment

Storage Load Testing SQL Server

Anthony Nocentino designs a test:

I’ve been doing storage load tests for SQL Server for a long time, both as a consultant and now in my work at Everpure, and I see the same patterns over and over. Someone spins up a VM with two vCPUs, points it at a storage subsystem (cloud or on-prem), runs a thousand threads at it, and then concludes that the storage stinks. Or the opposite, where they buy a 64 gigabit HBA, plug it into the wrong PCIe slot, and wonder why they’re leaving half of the capacity on the table.

Designing a good load test isn’t about pushing the biggest number you can find. It’s about knowing what your gear is supposed to do, driving load until you find the point where the system breaks down, and measuring at the right places along the way. In this post, I’m going to walk you through how I think about designing a storage load test, the tools I reach for, and the pitfalls that will invalidate your results if you’re not paying attention.

Click through for Anthony’s thought process.

Leave a Comment

SQL Server 2025 Installation on a Disk with Large Sector Size

Reitse Eskens runs into a problem:

This warning felt benign; more like you can run into errors. Well, as I found out, this lets you run into an SQL Server installation that just fails.

The error logs fill up with stack dumps and a fatal error that makes no sense. The installation log, however, shows you a more meaningful error.

Click through for the error message, as well as one way to fix the problem.

Comments closed

A Primer on Data Storage in PostgreSQL

Grant Fritchey shares some thoughts:

The whole idea behind a database is the ability to persist the data. You want your inventory of widgets to get stored so you can look at it later. That means writing out to disks. However, what is writing to disk and where is it being written? Unlike SQL Server which has one (or more) big file for all data, PostgreSQL has a collection of a large number of files. There is a methodology and structure to these files that you need to understand in order to later understand how the data gets written to and retrieved from these files.

While we’re going to be very focused on file, page, folder, etc., throughout this article, that’s just part of the physical nature of persisting your data. What is being persisted is still the logical information you’re most interested in – rows and columns. I just wanted to emphasize the distinction between the two here.

Click through to see how PostgreSQL stores information.

Comments closed

Columnstore Storage Structures

Hugo Kornelis continues a series on storage structures:

In the first part of this series, I described the storage structure and access patterns for SQL Server’s traditional storage structure: on-disk rowstore indexes (heaps and B-trees).

Columnstore indexes were introduced in SQL Server 2012. In that version only nonclustered columnstore indexes were supported (so they stored a copy of the data in the included columns, with the actual table data still stored in an on-disk rowstore heap or clustered index. And they made the table read only! That restriction was lifted in SQL Server 2014, when clustered columnstore indexes were also added. SQL Server 2016 then added the option to create additional nonclustered (on-disk rowstore) indexes on a clustered columnstore. And, also since SQL Server 2016, we now have ordered columnstore indexes – in my opinion a somewhat misleading name.

Read on to see how SQL Server manages this data.

Comments closed

Dealing with Long-Running I/O Requests in SQL Server

Rebecca Lewis has a two-parter. First up is finding instances of long-running I/O Requests:

When diagnosing storage or latency issues, one SQL Server message factors in more than many:

“SQL Server has encountered X occurrence(s) of I/O requests taking longer than 15 seconds to complete on file…”

Where X might be 1, 5 or 50, and it could list a file from any one of your databases. When you see this, the next good question is when did it happen and where.

And then the question is, what do you do about it? Rebecca provides some guidance:

In a previous post, I shared a script to detect the I/O requests taking longer than 15 seconds warning across your SQL Server inventory.  Now let’s talk about what to do when you find it.

Here are five of the most common causes with some tips to investigate each:

The neat part is, it’s not always due to slow storage or bad hardware.

Comments closed

Thoughts on On-Disk Rowstore in SQL Server

Hugo Kornelis starts a series on storage structures:

When a query is slow, it is often caused by inefficient access to the data. So our tuning work very frequently comes down to figuring out how data was read, and then massaging our queries or database structures to get SQL Server to access the data in a more efficient way.

So we look at scans, seeks, and lookups. We know that scans are good when we access most of the data. Or, in the case of an ordered scan, to prevent having to sort the data. We know that seeks are preferred when there is a filter in the query. And we know that lookups represent a good tradeoff between better performance and too many indexes, but only if the filter is highly selective.

All of the above is true. And all of it is highly generalized. And hence, often, not true enough to be actually useful.

Read on for an overview of the most common option.

Comments closed

Cloud Storage Archival via Parquet Files

Joey D’Antoni builds a tool:

What I’m writing about today has nothing to do with analytics, per se. It has everything to do with cloud storage, and the way operations there are priced. Specifically, metadata operations–in the demo code I’ve shared we’re going from five files to one, but you can imagine going from a much larger number of files to much smaller number of files. You may ask–“Joey that sounds dumb, why are you reinventing zip and iso files”. Well, the main reason is that many cloud operations are priced on the number of objects–for example if you had to calculate a checksum across a number of files on S3. (For files/objects that were created before S3 automatically did checksums).

Click through for more information on how it works, as well as a link to the GitHub repo.

Comments closed