Recommendations For Storage On Azure SQL DB Managed Instances

Dimitri Furman has some thoughts on database storage architecture for Azure SQL Database Managed Instances:

MI GP uses Azure Premium Storage to store database files for all databases, except for the tempdb database. From the perspective of the database engine, this storage type is remote, i.e. it is accessed over the network, using Azure network infrastructure. To use Azure Premium Storage, MI GP takes advantage of SQL Server native capability to use database files directly in Azure Blob Storage. This means that there is not a disk or a network share that hosts database files; instead, file path is an HTTPS URL, and each database file is a page blob in Azure Blob Storage.

Since Azure Premium Storage is used, its performance characteristics, limits, and scalability goals fully apply to MI GP. The High-performance Premium Storage and managed disks for VMs documentation article includes a section describing Premium Storage disk limits. While the topic is written in the context of VMs and Azure disks, which is the most common usage scenario for Azure Premium Storage, the documented limits are also applicable to blobs. As shown in the limits table in the documentation, the size of the blob determines the maximum IOPS and throughput that can be achieved against the blob. For MI GP, this means that the size of a database file determines the maximum IOPS and throughput that is achievable against the file.

The disk/blob size shown in the limits table is the maximum size for which the corresponding limit applies. For example, a blob that is > 64 GB and <= 128 GB (equivalent to a P10 disk) can achieve up to 500 IOPS and up to 100 MB/second throughput.

Read the whole thing if you’re looking at Managed Instances, but there are some tips for SQL Server in Azure IaaS.

Creating SQL Server Images In Azure Container Registry

Andrew Pruski shows us how to save Docker container images to the Azure Container Registry using Powershell:

Awesome! Our custom image is in our ACR!

But has it worked? Has it really? Oh ye of little faith…

I guess the only way to find out is to run a container! So let’s run a Azure Container Instance from our new image.

Spoilers:  it worked.

New Features In Public Preview On Azure SQL Database

Microsoft has a round of announcements for public previews on Azure SQL Database.  First up is Kevin Farlee announcing approximate count distinct:

The new APPROX_COUNT_DISTINCT aggregate function returns the approximate number of unique non-null values in a group.

This function is designed for use in big data scenarios and is optimized for the following conditions:

  • Access of data sets that are millions of rows or higher AND
  • Aggregation of a column or columns that have a large number of distinct values

Assuming these conditions, the accuracy will be within 2% of the precise result for a majority of workloads.

I’m liking this change.  Sometimes I simply need an approximate number  but I want it fast.

Shreya Verma announces MATCH support in the MERGE operator:

We will be further expanding the graph database capabilities with several new features. In this blog we will discuss one of these features that is now available for public preview in Azure SQL Database, MATCH support in MERGE DML for graph tables.

The MERGE statement performs insert, update, or delete operations on a target table based on the results of a join with a source table. For example, you can synchronize two tables by inserting, updating, or deleting rows in a target table based on differences between the target table and the source table. Using MATCH predicates in a MERGE statement is now supported on Azure SQL Database. That is, it is now possible to merge your current graph data (node or edge tables) with new data using the MATCH predicates to specify graph relationships in a single statement, instead of separate INSERT/UPDATE/DELETE statements.

I’ll use that approximately the day they fix all of the bugs with the MERGE operator.

Joe Sack announces row mode memory grant feedback:

In Azure SQL Database, we are further expanding query processing capabilities with several new features under the Intelligent Query Processing (QP) feature family.  In this blog post we’ll discuss one of these Intelligent QP features that is now available in public preview, row mode memory grant feedback.  Row mode memory grant feedback expands on the memory grant feedback feature by adjusting memory grant sizes for both batch and row mode operators.

Key feature benefits:

  • Reduce wasted memory. For an excessive memory grant condition, if the granted memory is more than two times the size of the actual used memory, memory grant feedback will recalculate the memory grant. Consecutive executions will then request less memory.

  • Decrease spills to disk. For an insufficiently sized memory grant that results in a spill to disk, memory grant feedback will trigger a recalculation of the memory grant. Consecutive executions will then request more memory.

This was big for batch mode operators, and I’m happy to see it move to row mode operators as well.

Finally, Joe also announces table variable deferred compilation:

In Azure SQL Database, we will be further expanding query processing capabilities with several new features under the Intelligent Query Processing (QP) feature family.  In this blog post we’ll discuss one of these Intelligent QP features that is now available in public preview in Azure SQL Database, table variable deferred compilation.

Table variable deferred compilation improves plan quality and overall performance for queries referencing table variables. During optimization and initial compilation, this feature will propagate cardinality estimates that are based on actual table variable row counts.  This accurate row count information will be used for optimizing downstream plan operations.

This one has the potential to be a pretty big performance improvement as well.

Using Azure Blob Storage Archive Tier For Archival Data

Bob Pusateri shows us how to configure Azure Blob Storage Archive Tier:

Two of the products I use extensively for this purpose are Amazon Glacier and, more recently, Microsoft Azure Blob Storage Archive Tier. As happy as I’ve been with Amazon Glacier since its introduction in 2012, I always hoped Microsoft would offer a similar service. My wish came true in Fall of 2017 when an archive tier of Azure Blob Storage was announced. Rather than branding this capability as a new product, Microsoft decided to present it as a new tier of Azure Blob Storage, alongside the existing hot and cool storage tiers.

A noticeable difference from the hot and cool storage tiers is that the archive storage tier is only available on a per-blob basis. While a storage account can be configured to have all blobs placed in either the hot or cool tier by default once they are uploaded, the archive tier is not an option. Once a blob is uploaded, it must explicitly be moved into the archive tier. If one is using the Azure Portal to do this, there’s several clicks involved per blob. The free Azure Storage Explorer client is no better. While I found several third party tools that can upload files to the archive tier, none were free. At this point, I decided to write my own method using Powershell, which I am happy to share below.

Read on for the script.  A good use for Azure Blob Storage Archive Tier would be storing old database backups which you have to keep around for compliance purposes but rarely use.

Using Azure Logic Apps For Database Tasks

Arun Sirpal shows off a technique he has developed to run maintenance jobs against Azure SQL Database databases:

I have been using Azure Logic apps recently to build some workflows to gather data from external sources ultimately inserting it into a database for reporting, I then got thinking, how can this be useful for “DBA” based tasks? Let’s take a step back for a minute, what are logic apps? It is a technology that helps integrate apps, data, systems, and services across enterprises. Key parts of a logic app solution are connectors, triggers and actions.

I decided that I wanted to execute a stored procedure every 6 hours to capture wait statistics for my Azure SQL Database and log the information in a table.

This is what my workflow looks like.

There are a few alternatives available, so it’s nice to see an example of one of them.

Auto-Scaling Amazon ElasticMapReduce

Brandon Schller gives us some tips on sizing and scaling ElasticMapReduce:

EMR scaling is more complex than simply adding or removing nodes from the cluster. One common misconception is that scaling in Amazon EMR works exactly like Amazon EC2 scaling. With EC2 scaling, you can add/remove nodes almost instantly and without worry, but EMR has more complexity to it, especially when scaling a cluster down. This is because important data or jobs could be running on your nodes.

To prevent data loss, Amazon EMR scaling ensures that your node has no running Apache Hadoop tasks or unique data that could be lost before removing your node. It is worth considering this decommissioning delay when resizing your EMR cluster. By understanding and accounting for how this process works, you can avoid issues that have plagued others, such as slow cluster resizes and inefficient automatic scaling policies.

If you’re using EMR today or think you might use it in the future, you should read this.

Azure Data Lake Store: Now On Blob Storage

James Serra announces Azure Data Lake Store Gen2:

Big news!  The next generation of Azure Data Lake Store (ADLS) has arrived.  See the official announcement.

In short, ADLS Gen2 is the combination of the current ADLS (now called Gen1) and Blob storage.  Gen2 is built on Blob storage.  By GA, ADLS Gen2 will have all the features of both, which means it will have features such as limitless storage capacity, support all Blob tiers (Hot, Cool, and Archive), the new lifecycle management feature, Azure Active Directory integration, hierarchical file system, and read-access geo-redundant storage.

A Gen2 capability is what is called “multi-modal” which means customers can use either Blob object store APIs or the new Gen2 file system APIs.  The key here is that both blob and file system semantics are now supported over the same data.

One very interesting thing to me is that Gen2 pricing is half of Gen1.

Testing Network Speed Between Azure Regions

Dave Bermingham has some quick inter-region tests for Azure network performance:

This is the question I asked myself today and of course I couldn’t find this documented anywhere. I’m assuming there is no guarantee and it probably depends on current utilization, etc. If I’m wrong, someone please point me to the documentation that states the available speed. I primarily looked here and here.

So I set up two Windows 2016 D4s v3 instances, one in Central US and one in East US 2, which are paired regions.

If you don’t know what peering is, it essentially lets you to easily connect two different Azure virtual networks. Peering is very easy to setup, just make sure you configure it from both Virtual Networks, I made that mistake at first. Once it is configured properly it will look something like this.

Read on for Dave’s results.

Backing Up Azure Data Lake Store Data

Hugo Almeida has some hints for backing up Azure Data Lake Store data using Azure Data Factory:

Our Hadoop HDP IaaS cluster on Azure uses Azure Data Lake Store (ADLS) for data repository and accesses it through an applicational user created on Azure Active Directory (AAD). Check this tutorial if you want to connect your own Hadoop to ADLS.

Our ADLS is getting bigger and we’re working on a backup strategy for it. ADLS provides locally-redundant storage (LRS), however, this does not prevent our application from corrupting data or accidentally deleting it. Since Microsoft hasn’t published a new version of ADLS with a clone feature we had to find a way to backup all the data stored in our data lake.

We’re going to show you How to do a full ADLS backup with Azure Data Factory (ADF). ADF does not preserve permissions. However, our Hadoop client can only access the AzureDataLakeStoreFilesystem (adl) through hive with a “hive” user and we can generate these permissions before the backup.

Read the whole thing if you’re thinking of using Azure Data Lake Store.

Alerting On Azure Data Lake Store Data Usage

Jose Lara shows off an interesting feature in Azure Data Lake Store:

The massive scale and capabilities of Azure Data Lake Store are regularly used by companies for big data storage. As the number of files, file types, and folders grow, things get harder to manage and staying compliant becomes a greater challenge for companies. Regulations such as GDPR (General Data Protection Regulation) have heightened requirements for control and supervision of files that contain sensitive data.

In this blog post, I’ll show you how to set up alerts in your Azure Data Lake Store to make managing your data easier. We will create a log analytics query and an alert that monitors a specific path and file type and sends a notification whenever the path or file is created, accessed, modified, or deleted.

Auditing access has historically been tricky, so it’s nice that they were able to get that in.


July 2018
« Jun