Storage – Page 14 – Curated SQL

One of the best things about Azure, and the cloud in general, is we can automate most anything, and we are going to look at how to automate Azure VM Storage. This allows us to come up with some outside-of-the-box solutions. I had a customer with a road block that we were able to work around by automating some things with their Azure Virtual Machines.
Their challenge was that they wanted to move their test and development environments to Azure, but the storage cost was prohibitive. They needed premium storage to mimic their production environment, but it was not financially viable for test and development so they were going to keep it all on premises. During our conversations I learned that they only test between 8am and 5pm, Monday through Friday. My suggestion was that we put their databases on cheaper storage during off times and only premium when they are actively using it.

This doesn’t look like a one-hour task but if you’re in need of some cost savings on storage in non-production environments, check out Ryan’s scripts.

Comments closed

Where To Store SQL Server DB Files In Azure

Published 2019-02-04 by Kevin Feasel

James Serra gives us a few options for storing database files (that is, your MDF, NDF, and LDF files) when running SQL Server in an Azure VM:

If using managed disks, the recommendation is to attach several premium SSDs to a VM. From Sizes for Windows virtual machines in Azure you can see for the size DS5_v2 you can add 64 of 4TB data disks (P50) yielding the potential total volume size of up to 256TB per VM, and if you use the preview sizes (P80) your application can have up to around 2PB of storage per VM (64 of 32TB disks). With premium storage disks, your applications can achieve 80,000 I/O operations per second (IOPS) per VM, and a disk throughput of up to 2,000 megabytes per second (MB/s) per VM. Note the more disks you add the more storage you get and the more IOPS you get, but only up to a certain point due to VM limits, meaning at some point more disks get you more storage but not more IOPS.

But there are a few other options too, so check them out.

Comments closed

Could Not Clear Differential Bitmap

Published 2019-01-30 by Kevin Feasel

Jack Vamvas takes us through a reason why you might get error 3041:

An error message has started appearing in the SQL Server Error Logs during a nightly full backup.
Could not clear ‘DIFFERENTIAL’ bitmap in database ‘Database1’ because of error 9002. As a result, the differential or bulk-logged bitmap overstates the amount of change that will occur with the next differential or log backup. This discrepancy might slow down later differential or log backup operations and cause the backup sets to be larger than necessary. Typically, the cause of this error is insufficient resources. Investigate the failure and resolve the cause. If the error occurred on a data backup, consider taking a data backup to create a new base for future differential backups.

Click through for the root cause and solution.

Comments closed

SQL Server Data File Size Limit: 16 TB

Published 2019-01-22 by Kevin Feasel

Dave Mason learned the hard way that data files max out at 16 TB in SQL Server:

I could see that most of the stored proc execution calls were coming from the same server, which was a processing server running a web service. The customer decided to shut down the service and have a quick discussion with his dev team to see if there had been any recent changes deployed. We called it a night. I got a text the next morning. The server was rebooted, and performance had improved noticeably. None to my surprise, I got another call later that day: performance had deteriorated again.
I got back online and looked at all the same things I’d looked at before and still was puzzled.

I think this is a case where Swart’s Ten Percent Rule is easier to violate than most: terabyte-sized databases are fairly common these days, though most of them probably have multiple data files to help with piecemeal recovery.

Comments closed

Erasure Coding In Hadoop

Published 2019-01-15 by Kevin Feasel

Guy Shilo explains erasure coding, a new feature in Hadoop 3:

The benefits are, of course, space-saving, and for large files also improved performance (blocks striped across datanodes can be read in parallel, and less blocks are written because there is no x3 replication). The larger the file the more notable is the performance gain.
Erasure encoding is disabled by default and you can enable it for only certain directories in HDFS. Some articles like this one suggest thatbest practice is to enable Erasure coding only for “cold” data that you do not write often, and for “hot” data use regular replication. However, in my tests I did not witness any problem dealing with hot data (maybe it’s evident in larger scales).

Click through for the full story on how it works.

Comments closed

Creating An Azure Storage Account

Published 2019-01-02 by Kevin Feasel

John Morehouse walks us through setting up an Azure Storage Account through the Azure Portal:

Azure offers a lot of features that enable IT professionals to really enhance their environment. One feature that I really like about Azure is storage accounts. Since disk is relatively cheap, this continues to hold true in the cloud. For less than $100 per month, you could get up to 5TB of storage including redundancy to another Azure region.

Read on to learn how to set up one of these.

Comments closed

Availability Groups And Read-Only Filegroups

Published 2018-12-17 by Kevin Feasel

Allan Hirt walks us through a couple of scenarios involving databases with read-only filegroups using Always On Availability Groups:

A question came across my inbox this week which I decided to investigate: what happens if you have read only filegroups as part of your database, and you want to use Always On Availability Groups? Let’s find out.
First, I have a database with two filegroups: one read write (PRIMARY) and one read only (ROFG).

Click through for a demonstration of this, as well as a different scenario in which you might want only the read-write data on the secondary.

Comments closed

Azure Data Lake Storage Generation 2

Published 2018-12-10 by Kevin Feasel

James Baker announces updates to Azure Data Lake Storage Gen2:

As we’ve discussed many times, the performance of the storage layer has an outsized impact on the total cost of ownership (TCO) for your complete analytics pipeline. This is due to the fact that every percentage point improvement in storage performance results in that same percentage reduction in the requirement for the very expensive compute layer. Given that the disaggregated storage model allows us to scale compute and storage independently, that percentage reduction in compute requirement results in almost the same (compute typically equates to 90 percent of the TCO) reduction in TCO.
So, when I say that ADLS Gen2 provides performance improvements ranging from 10-50 percent, depending on the nature of the workload over existing storage solutions, this equates to VERY significant reductions in the monthly analytics spend. It also has the added benefit of providing your insights sooner!

Check out all of the changes.

Comments closed

Premium Blob Storage In Azure

Published 2018-11-07 by Kevin Feasel

James Serra describes a new tier of Azure Blob Storage:

As a follow-up to my blog Azure Archive Blob Storage, Microsoft has released another storage tier called Azure Premium Blob Storage (announcement). It is in private preview in US East 2, US Central and US West regions.

This is a performance tier in Azure Blob Storage, complimenting the existing Hot, Cool, and Archive tiers. Data in Premium Blob Storage is stored on solid-state drives, which are known for lower latency and higher transactional rates compared to traditional hard drives.

It is ideal for workloads that require very fast access time such as interactive video editing, static web content, and online transactions. It also works well for workloads that perform many relatively small transactions, such as capturing telemetry data, message passing, and data transformation.

It’s in private preview for now, but my guess is that it’ll be available to the general public soon enough.

Comments closed

Using IO Cache To Speed Up Spark Jobs

Published 2018-10-30 by Kevin Feasel

Chris Seferlis looks at what the HDInsight team has done to speed up Apache Spark jobs:

The big news here is the recently released preview of HDInsight IO Cache, which is a new transparent data caching feature that provides customers with up to 9X performance improvement for Spark jobs, without an increase in costs.

There are many open source caching products that exist in the ecosystem: Alluxio, Ignite, and RubiX to name a few big ones. The IO Cache is also based on RubiX and what differentiates RubiX from other comparable caching products is its approach of using SSD and eliminating the need for explicit memory management. While other comparable caching products leverage the reservation of operating memory for caching the data.

Read on for more details.

Comments closed

Category: Storage

Automating Azure Storage To Move Between Tiers

Where To Store SQL Server DB Files In Azure

Could Not Clear Differential Bitmap

SQL Server Data File Size Limit: 16 TB

Erasure Coding In Hadoop

Creating An Azure Storage Account

Availability Groups And Read-Only Filegroups

Azure Data Lake Storage Generation 2

Premium Blob Storage In Azure

Using IO Cache To Speed Up Spark Jobs