Press "Enter" to skip to content

Category: Storage

Persisting SQL Server Databases on Kubernetes

Anthony Nocentino walks us through persistent volumes and Kubernetes:

One of the key principals of Kubernetes is the ephemerality of Pods. No Pod is every redeployed, a completely new Pod is created. If a Pod dies, for whatever reason, a new Pod is created in its place there is no continuity in the state of that Pod. The newly created Pod will go back to the initial state of the container image defined in the Pod’s spec. This is very valuable for stateless workloads, not so much for stateful workloads like SQL Server.

This means that for a stateful workload like SQL Server we need to store both configuration and data externally from the Pod to maintain state through the recreation of a Pod. Kubernetes give us constructs two constructs to do that, environment variables and Persistent Volumes. 

Read on for a good bit of background and a few scripts to help you get started.

Comments closed

Persistent Storage and Kubernetes

Chris Adkin explains the concepts behind persistent storage in containers:

A question that often crops up is “Can I use local storage”, the answer is “It depends”. Kubernetes is essentially a container scheduler at its most basic and fundamental level. The ‘Pod’ is the unit of scheduling, containers in the same pod share the same life cycle and always run on the same node. For stateless pods life is reasonably simple and straight forward, for state-full pods, life is a bit more nuanced. If for any reason a node fails, the pods that ran on that node have to be rescheduled to run on a working node, and their  storage needs to follow them. This involves un-mounting the volume from the failed node and then mounting it on the node the pod(s) are rescheduled to run on. With basic vanilla hyper-converged storage, i.e. storage and compute in the same chassis, this will ultimately lead to scheduling problems. However, software defined solutions exist that enable this kind of infrastructure to be turned into a storage cluster which allows state to follow pods around the cluster. Some people automatically associated HDFS with local storage, the reason for this is probably because “Back in the day”, the most cost efficient way for Google to scale out its infrastructure was via commodity servers with local disks. 

Read the whole thing.

Comments closed

Persisting Databases with Docker Named Volumes

Andrew Pruski shows us how to use named volumes to keep containerized SQL Server databases hanging around:

The mssqluser named volume is going to be mounted as /var/opt/sqlserver and the mssqlsystem volume is going to be mounted as /var/opt/mssql. This is the key to the databases automatically being attached in the new container, /var/opt/mssql is the location of the system databases.

If we didn’t mount a named volume for the system databases any changes to those databases (particularly for the master database) would not be persisted so the new container would have no record of any user databases created.

Read the whole thing.

Comments closed

Azure Blob Storage and Data Lake Storage Gen2

Melissa Coates shows what you need to know about Azure Blob Storage with Azure Data Lake Storage Gen2:

– You may need to consider separate storage accounts if you need to segregate access control (RBAC), virtual networks, access keys, and the like. (Note that RBAC can also be set at the container level too, but ACL type permissions only apply to ADLS Gen2 and not to blob storage.)
– If you don’t need the hierarchical namespace whatsoever (for non-analytical use cases), this could mean a separate storage account. The storage cost is the same but transaction costs are higher when the HNS is enabled (discussed in item #8 of this post).

Click through for more details, including several more tips about Azure Storage Accounts, Azure Blob Storage Containers, and the Azure Storage Blobs themselves.

Comments closed

When Faster Disk Increases WRITELOG Waits

Paul Randal explains that WRITELOG waits can potentially increase as you get faster disk:

I was contacted last week by someone who was confused about the WRITELOG wait type. They were seeing lots of these waits, with an average wait time of 18ms. The log was stored on a Raid-1 array, using locally-attached spinning disks in the server. They figured that by moving the log to Raid-1 array of SSDs, they’d reduce the WRITELOGwait time and get better workload throughput.

They did so and got better performance, but were very surprised to now see WRITELOG as the most frequent wait type on the server, even though the average wait time was less than 1ms, and asked me to explain.

Read on for Paul’s explanation of why this is not a scary situation, or is it particularly weird. SQL Server performance is a complicated thing and trying to limit it to one measure or one query can lead you down the wrong path.

Comments closed

Automating Azure Storage To Move Between Tiers

Ryan Adams built a process to save money on storage costs for a customer’s test environment:

One of the best things about Azure, and the cloud in general, is we can automate most anything, and we are going to look at how to automate Azure VM Storage.  This allows us to come up with some outside-of-the-box solutions.  I had a customer with a road block that we were able to work around by automating some things with their Azure Virtual Machines.

Their challenge was that they wanted to move their test and development environments to Azure, but the storage cost was prohibitive.  They needed premium storage to mimic their production environment, but it was not financially viable for test and development so they were going to keep it all on premises.  During our conversations I learned that they only test between 8am and 5pm, Monday through Friday.  My suggestion was that we put their databases on cheaper storage during off times and only premium when they are actively using it.

This doesn’t look like a one-hour task but if you’re in need of some cost savings on storage in non-production environments, check out Ryan’s scripts.

Comments closed

Where To Store SQL Server DB Files In Azure

James Serra gives us a few options for storing database files (that is, your MDF, NDF, and LDF files) when running SQL Server in an Azure VM:

If using managed disks, the recommendation is to attach several premium SSDs to a VM.  From Sizes for Windows virtual machines in Azure you can see for the size DS5_v2 you can add 64 of 4TB data disks (P50) yielding the potential total volume size of up to 256TB per VM, and if you use the preview sizes (P80) your application can have up to around 2PB of storage per VM (64 of 32TB disks).  With premium storage disks, your applications can achieve 80,000 I/O operations per second (IOPS) per VM, and a disk throughput of up to 2,000 megabytes per second (MB/s) per VM.  Note the more disks you add the more storage you get and the more IOPS you get, but only up to a certain point due to VM limits, meaning at some point more disks get you more storage but not more IOPS.

But there are a few other options too, so check them out.

Comments closed

Could Not Clear Differential Bitmap

Jack Vamvas takes us through a reason why you might get error 3041:

An error message has started appearing in the SQL Server Error Logs during a nightly full backup.

Could not clear ‘DIFFERENTIAL’ bitmap in database ‘Database1’ because of error 9002. As a result, the differential or bulk-logged bitmap overstates the amount of change that will occur with the next differential or log backup. This discrepancy might slow down later differential or log backup operations and cause the backup sets to be larger than necessary. Typically, the cause of this error is insufficient resources. Investigate the failure and resolve the cause. If the error occurred on a data backup, consider taking a data backup to create a new base for future differential backups.

Click through for the root cause and solution.

Comments closed

SQL Server Data File Size Limit: 16 TB

Dave Mason learned the hard way that data files max out at 16 TB in SQL Server:

I could see that most of the stored proc execution calls were coming from the same server, which was a processing server running a web service. The customer decided to shut down the service and have a quick discussion with his dev team to see if there had been any recent changes deployed. We called it a night. I got a text the next morning. The server was rebooted, and performance had improved noticeably. None to my surprise, I got another call later that day: performance had deteriorated again.
I got back online and looked at all the same things I’d looked at before and still was puzzled.

I think this is a case where Swart’s Ten Percent Rule is easier to violate than most: terabyte-sized databases are fairly common these days, though most of them probably have multiple data files to help with piecemeal recovery.

Comments closed

Erasure Coding In Hadoop

Guy Shilo explains erasure coding, a new feature in Hadoop 3:

The benefits are, of course, space-saving, and for large files also improved performance (blocks striped across datanodes can be read in parallel, and less blocks are written because there is no x3 replication). The larger the file the more notable is the performance gain.

Erasure encoding is disabled by default and you can enable it for only certain directories in HDFS. Some articles like this one suggest thatbest practice is to enable Erasure coding only for “cold” data that you do not write often, and for “hot” data use regular replication. However, in my tests I did not witness any problem dealing with hot data (maybe it’s evident in larger scales).

Click through for the full story on how it works.

Comments closed