Press "Enter" to skip to content

Category: Storage

Azure Data Lake Store Best Practices

Ust Oldfield provides recommendations on how to size and lay out files in Azure Data Lake Store:

The format of the file has a huge implication for the storage and parallelisation. Splittable formats – files which are row oriented, such as CSV – are parallelizable as data does not span extents. Non-splittable formats, however, – files what are not row oriented and data is often delivered in blocks, such as XML or JSON – cannot be parallelized as data spans extents and can only be processed by a single vertex.

In addition to the storage of unstructured data, Azure Data Lake Store also stores structured data in the form of row-oriented, distributed clustered index storage, which can also be partitioned. The data itself is held within the “Catalog” folder of the data lake store, but the metadata is contained in the data lake analytics. For many, working with the structured data in the data lake is very similar to working with SQL databases.

This is the type of thing that you can easily forget about, but it makes a huge difference down the line.

Comments closed

Partition Alignment In Storage

Wayne Sheffield has an article on I/O partition alignment:

What we need to do is to offset the beginning of the data being stored on disk to a location more conducive to how the program is operating. This offset is known as the “Partition Alignment Offset”. To be in tune with SQL Server, this value should be an increment of 64KB. However, you also need to consider the entire storage subsystem – the disks, controllers and memory. Starting with Windows Server 2008, this offset is at 1024KB – a nice increment of 64KB that also works very nicely with most RAID disks/controllers. Prior to Windows Server 2008, partition alignment offset was not explicitly performed, so this will need to be performed.

If you’ve migrated disk from server to server to server over the years, this is worth checking out.

Comments closed

Cool Storage

James Serra talks about “cool storage” in Azure Blob Storage:

The access tiers available for blob storage accounts are “hot” and “cold”.  In general, hot data is classified as data that is accessed very frequently and needs to be highly durable and available.  On the other hand, cool data is data that is infrequently accessed and long-lived.  Cool data can tolerate a slightly lower availability, but still requires high durability and similar time to access and throughput characteristics as hot data.  For cool data, slightly lower availability SLA and higher access costs are acceptable tradeoffs for much lower storage costs.  Azure Blob storage now addresses this need for differentiated storage tiers for data with different access patterns and pricing model.  So you can now choose between Cool and Hot access tiers to store your less frequently accessed cool data at a lower storage cost, and store more frequently accessed hot data at a lower access cost.  The Access Tier attribute of hot or cold is set at an account level and applies to all objects in that account.  So if you want to have both a hot access tier and a cold access tier, you will need two accounts.  If there is a change in the usage pattern of your data, you can also switch between these access tiers at any time.

It looks like there shouldn’t be a performance difference between the two;  it’s more of a cost difference in which you might be able to save money by choosing your tier wisely.

Comments closed

LDF Stamp Change

The CSS SQL Server engineering team points out that the LDF stamp has changed from 0x00 to 0xC0:

Question:  If the log is stamped with 0xC0’s instead of 0x00’s how is it a performance gain?

Many of the new hardware implementations detect patterns of 0x00’s.   The space is acquired and zero’s written to stable media, then a background, hardware based garbage collector reclaims the blocks.

This is a very interesting background article which shows an integration pain point between the database platform and the storage platform.

Comments closed

Who Grew The Database?

Andy Galbraith helps us figure out who to blame for database growth:

I feel a little dirty writing about the Default Trace in the world of Extended Events, but I also know that many people simply don’t know how to use XEvents, and this can be faster if you already have it in your toolbox.  Also it will work back to SQL 2005 where XEvents were new in SQL 2008.

I have modified this several times to improve it – I started with a query from Tibor Karaszi (blog/@TiborKaraszi), modified it with some code from Jason Strate (blog/@StrateSQL), and then modified that myself for what is included and what is filtered.  There are links to both Tibor’s and Jason’s source material in the code below.

I usually just blame the BI team for database growth.

Comments closed

Measuring IOPs

Joey D’Antoni shows us how to measure IOPs (I/O Per Second) on a SQL Server instance/server:

That handy SQL Server:Resource Pool Stats counter and it’s Disk Read IO/sec and Disk Write IO/sec provide you with the data you need to give your SAN admin. In this screenshot this is an Enterprise Edition instance, and you can see my resource pools on the left side—so if you are using resource governor, you could use this to classify IO workload by application for potential chargeback situations.

Very useful, and when combined with Resource Governor, can help you throttle I/O effectively (as opposed to wildly flailing in the general direction of a fix).

Comments closed

Azure Storage Options

James Serra walks us through the list of storage options available on Azure:

Microsoft Azure is a cloud computing platform and infrastructure, created by Microsoft, for building, deploying and managing applications and services through a global network of Microsoft-managed and Microsoft partner-hosted datacenters.  Included in this platform are multiple ways of storing data.  Below I will give a brief overview of each so you can get a feel for the best use case for each, with links provided that go into more detail:

There are several options available, running the gamut from unstructured data (blob storage, file & disk storage), semi-structured data (data lake store), to structured data (Azure SQL Database) and a few points in between.

Comments closed

Storage Spaces Direct

Glenn Berry introduces us to Storage Spaces Direct:

One of the more exciting new features in Windows Server 2016 is called Storage Spaces Direct (S2D), which enables organizations to use multiple, clustered commodity file server nodes to build highly available, scalable storage systems with local storage, using SATA, SAS, or PCIe NVMe devices. You can use internal drives in each storage node, or direct-attached disk devices using “Just a Bunch of Disks” (JBOD) where each JBOD is only connected to a single storage node. This eliminates the previous requirement for a shared SAS fabric and its complexities (which was required with Windows Server 2012 R2 Storage Spaces and SOFS), and also enables using less expensive storage devices such as SATA disks.

This sounds like a technology with interesting potential, and not requiring SAS disks will make it more likely to be adopted.

Comments closed