Press "Enter" to skip to content

Category: Storage

Loading Data from Network-Protected Storage Accounts into OneLake

Matt Basile grabs some data:

AzCopy is a powerful and performant tool for copying data between Azure Storage and Microsoft OneLake, and is the preferred tool for large-scale data movement due to its ease of use and built-in performance optimizations. AzCopy now supports copying data from firewall-enabled Azure Storage accounts into OneLake using trusted workspace access. Now you can use AzCopy to load data from even network-protected storage accounts, letting you effortlessly load data into OneLake without compromising on security or performance.

Click through for an explanation of trusted workspace access, followed by the steps to try it out for yourself.

Leave a Comment

Equalizing Proxy vs Redirect Rates for OneLake Access

Elizabeth Oldag announces a pricing change:

We’re thrilled to share a major update and simplification to OneLake’s capacity utilization model that will make it even easier to manage Fabric capacity and scale your data workloads. We are reducing the consumption rate of OneLake transactions via proxy to match the rate for transactions via redirect. This means you no longer have to worry where you are accessing your OneLake data from (via proxy or redirect), they will consume your capacity at the same low rate.

Read on to see what this means in practice.

Leave a Comment

Sundry Causes of Slow Disk Performance

Kevin Hill thinks about I/O:

“SQL Server is slow.”

We’ve all heard it. But that doesn’t always mean SQL Server is the problem. And “slow” means nothing without context and ability to verify.

More often than you’d think, poor performance is rooted in the one thing most sysadmins don’t touch until it’s on fire: the disk subsystem.

There are other potential causes as well, such as choosing the wrong RAID array format (like, say, RAID 6 for your extremely busy log files) and limited bandwidth to a SAN. Note that Kevin’s listings for what constitutes acceptable disk focuses primarily on on-premises solutions, maybe biased toward direct attached storage versus a SAN. For cloud databases, spikes of 30-60 seconds are perfectly fine, of course.

Leave a Comment

Checking Disk Free Space on Mount Points in SQL Server

Tom Collins doesn’t need drive letters:

I have some SQL Servers and the  file system is organised with mount points. SQL Server is configured to distribute the system files , data , transaction log and tempdb files on those mountpints. As these are Production systems – the DBA does not have permanent access to the Disak Manager without a request for elevated privileges. But they do have privileges to execute administration type of queries. 

Do you have a query that allows the DBA to excute a query to determine the Total Size , Available Size and Space Free on the mountpoints 

Read on for the answer, as well as a link to an older article from Tom on the topic.

Comments closed

Multi-Storage Array Database Snapshots in SQL Server

Anthony Nocentino continues a series:

In this post, the fourth in our series, I want to share an example demonstrating SQL Server 2022’s T-SQL Snapshot Backup feature in a scenario where a database spans multiple storage arrays. If you’re dealing with multi-array environments, you’ll appreciate how this technique freezes database write I/O to take coordinated snapshots across volumes on two FlashArrays. In this post, I’ll walk you through the process, point out some of the script’s key elements, and show you how long the write I/O pause takes.

Click through to learn how it all works when you have a database spanning multiple volumes.

Comments closed

Drawbacks of Version Store in SQL Server

Haripriya Naidu points out that nothing comes for free:

With every feature comes both advantages and disadvantages and it is important to be aware of both. In regards to version store, we had discussed how to understand version store better and how version store is enabled.

Today, we will talk about the trade-offs and implications that come along when version store is enabled.

Click through for the downsides. With storage prices and speeds where they are, the tradeoffs of using the version store for RCSI or Snapshot isolation are almost always worth it. But it’s still good to know what those trade-offs are and how you can end up in a degenerate state.

Comments closed

Using PolyBase for Archiving Data

Andy Yun is speaking my language:

One of SQL Server 2022’s new features is something called Data Virtualization. It enables T-SQL to directly query files that reside in Azure object storage or S3-compatible object storage. In my opinion, since SQL Server 2022’s release, it’s one of those underrated capabilities that I think many have glossed over. But I strongly believe that it is insanely useful and you should take a few minutes to learn more!

Read on to learn more. Also, Andy mentions using S3-compatible local storage with PolyBase for local storage. As a spoiler, I have a video coming out on January 28th that covers exactly that same topic, though without the benefit of snappy all-flash storage arrays.

1 Comment

Speed Differences with Separating Data and Log Files

Brent Ozar performs a test:

I’ve already explained that no, it doesn’t make your database server more reliable – and in fact, it’s the exact opposite. But what about performance?

The answer is going to depend on your hardware and workload, but let’s work through an example. I’ll take the first lab workload from the Mastering Server Tuning class and set it up on an AWS i3en.2xlarge VM, which has 8 cores, 64GB RAM, and two 2.5TB NVMe SSDs. (This was one of the cheapest SQL-friendly VM types with two SSDs, but of course there are any number of ways you could run a test like this, including EBS volumes.)

I would expect cloud versus on-premises answers to be quite different, because cloud services tend to throttle you hard on how much storage throughput you’re allowed to have. For that reason, the results make perfect sense in AWS (or Azure or GCP for that matter), but unless your on-prem solution has hard throttles on IOPS or throughput because your sysadmins are monsters, the limits of performance would be in how hard you can push the drives or your storage controllers.

Ultimately, the most appropriate answer is to test your systems and not rely on expectations, especially if you’re shifting from on-premises to a cloud (or vice versa).

Comments closed

It’s Probably Not Data Corruption on Disk

Andy Yun talks storage:

I cannot tell you how many times I’ve encountered scenarios where “this data looks wrong.” Well… can one ensure that it is being retrieved and displayed correctly from the storage media that it resides on in the first place? Are you viewing/validating the data in question correctly? Whatever client/method you are using to review your data – that is suspect and its integrity is in question.

It is technically possible for bits to flip, but that’s also why we have checksums on disk. I’m sure there are people who have experienced storage corruption that changed just enough to cause problems but not enough to be noticeable, but Andy is right on the money.

Comments closed

Azure SQL Managed Instance Extreme Storage Latency

Kendra Little has another caveat emptor message:

What are your stories of unbelievably bad performance from cloud vendors? I’ll go first. For years, Azure SQL Managed Instance’s General Purpose Tier has documented “approximate” storage latency as being “5-10 ms.” This week they added a footnote: “This is an average range. Although the vast majority of IO request durations will fall under the top of the range, outliers which exceed the range are possible.”

How approximate is that 5-10 milliseconds, you might wonder? If you use Azure SQL Managed Instance these days, you will regularly find messages in your SQL Server Error log indicating that all data and log files have experienced latency of up to 60 seconds. At least, 60 seconds is the maximum I’ve observed personally, looking in the logs of several customers’ Managed Instances. Could it be worse? Microsoft hasn’t documented a ceiling. My testing shows that this latency occurs randomly to your workload and is not related to your resource usage: using less IO will not make the errors less likely. You have no way to avoid these storage failures (I don’t see how 15-60 second latency is not a failure), and they can occur anytime.

This is a major strike against SQL Managed Instance General Purpose. Considering the cost of the product, you could buy a new server with direct-attached NVMe storage, have it paid off after one year, have better performance, and get to depreciate the entire expense over a 3-5 year window (something you cannot do with the hardware side of cloud services–you can only depreciate the cost of licensing, assuming you have a 3-year reservation).

2 Comments