Recently, we were working on a performance testing exercise using a SQL Server database with files in Azure Blob Storage. After creating the database using the default 8 MB size for data and log file (as in the example above), we wanted to increase the size of all files to be sufficient for the expected workload. IFI was not yet enabled for the SQL Server instance we were working with, and growing the data file from 8 MB to 1 TB took about one minute (using Premium Storage). This was expected, since the data file had to be fully initialized. We expected that the log growth to 1 TB would take about as much time, for the same reason. It was very surprising then that the same operation on the log file completed in less than one second.
It turns out that this is due to differences in Azure Blob Storage versus traditional storage systems.
You look at the numbers again, and now you find that disk latency, which had previously been fine, is now completely in the tank during the business day, showing that I/O delays are through the roof.What happened?This demonstrates the concept of shifting bottleneck – while CPU use was through the roof, the engine so bogged down that it couldn’t generate that much I/O, but once the CPU issue was resolved queries started moving through more quickly until the next choke point was met at the I/O limit. Odds are once you resolve the I/O situation, you would find a new bottleneck.How do you ever defeat a bad guy that constantly moves around and frequently changes form?
Click through for some pointers on disk latency and trying to figure out when it becomes a problem.
Do you need to worry about disk block size?
I would suggest you to read below articles for getting better understanding on this topic
Post discussion, the next question was how do we check the disk block size for a given server?
You can do it from command line using FSutil utility. But let’s do it with PowerShell.
Read on for the code.
Last year Microsoft introduced Azure Cool Blob storage, which cost customers a penny per GB per month in some Azure regions. Now, users have another, lower-cost option in Azure Archive Blob Storage, along with new Blob-Level Tiering data lifecycle management capabilities. So there are now three Azure blog storage tiers: Hot, Cool, and Archive.
Azure Archive Blob Storage costs 0.18 cents per GB per month when the service is delivered through its cloud data center in the East US 2 (for comparison, in the same region hot is 1.8 cents and cool is 1.0 cents per GB per month) . Customers can expect a 99 percent availability SLA (service level agreement) when the service makes its way out of the preview stage.
This is Azure’s response to AWS Glacier. The immediate sticker price is a bit higher, but if there aren’t any incremental costs associated with deletion, uploading, or retrieving files, then it could end up matching Glacier in TCO.
Looking in the SQL Error Log there were hundreds of these combinations in the minutes immediately preceding the job failure:
The operating system returned error 665(The requested operation could not be completed due to a file system limitation) to SQL Server during a write at offset 0x000048a123e000 in file ‘E:\SQL_Data\VLDB01.mdf:MSSQL_DBCC17‘. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
Error: 17053, Severity: 16, State: 1.
E:\SQL_Data\VLDB01.mdf:MSSQL_DBCC17: Operating system error 665(The requested operation could not be completed due to a file system limitation) encountered.
Read on for more information, including a rough idea of how much space the snapshot requires as well as a few workarounds and hints.
We see that all four of the drives show up in the list. Because of the nature of how I presented the disks to the VM, the bootable drives (sdd) show up at the end of the chain instead of the beginning, but that’s OK. It doesn’t change how the disks are bootable.
We are going to use the Logical Volume Manager (LVM) to manage the disks for us instead of using regular partitions. On Windows-based servers, Windows has the ability to expand partitions without incurring any downtime, and we want to have the same flexibility from a Linux standpoint. With elevated priviledges (for the rest of the commands), let’s scan the drives to look at what it sees.
Read the whole thing.
Remember, this disk is as the title of this section says…TEMPORARY! Do NOT put ANYTHING on this drive you cannot afford to lose. Don’t say nobody warned you either, because the drive itself contains a nice little txt file warning you, here is the EXACT text:
WARNING : THIS IS A TEMPORARY DISK.
Any data stored on this drive is SUBJECT TO LOSS and THERE IS NO WAY TO RECOVER IT.
Please do not use this disk for storing any personal or application data.
It’s good to see what you need to do to get this working. I’ve found it just to be easier to set up a permanent SSD, but if you’re on a tight budget, this can save you some cash.
Last week in Part Two I went through how to create named volumes and map them to containers in order to persist data.
However, there is also another option available in the form of data volume containers.
The idea here is that create we a volume in a container and then mount that volume into another container. Confused? Yeah, me too but best to learn by doing so let’s run through how to use these things now.
Read through for the step-by-step description.
So it’s time for a quiz:
- If you put all of a SQL Server’s data files & logs on a single volume, how many failures will that server experience per year?
- Bonus question: what kinds of data loss and downtime will each of those failure(s) have?
- If you split a SQL Server’s data files onto one volume, and log files onto another volume, how many failures will that server experience per year?
- Bonus question: what kinds of data loss and downtime will each of those failures have?
Think carefully about the answers – or read the comments to see someone else’s homework, hahaha – before you move on.
With SANs, this advice is not even that good on the performance side—especially with modern SANs which don’t let you dedicate spindles. It definitely doesn’t fly on the reliability side.
Based on our experience, S3’s availability has been fantastic. Only twice in the last six years have we experienced S3 downtime and we have never experienced data loss from S3.
Amazon claims 99.999999999% durability and 99.99% availability. Note that this is higher than the vast majority of organizations’ in-house services. The official SLA from Amazon can be found here: Service Level Agreement – Amazon Simple Storage Service (S3).
For HDFS, in contrast, it is difficult to estimate availability and durability. One could theoretically compute the two SLA attributes based on EC2’s mean time between failures (MTTF), plus upgrade and maintenance downtimes. In reality, those are difficult to quantify. Our understanding working with customers is that the majority of Hadoop clusters have availability lower than 99.9%, i.e. at least 9 hours of downtime per year.
It’s interesting how opinion has shifted; even a year ago, the recommendation would be different.