Category: Cloud

With ADLA Task in Azure Feature Pack, you can now orchestrate and create U-SQL jobs as a part of the SSIS workflow to process big data in the cloud. As ADLA is a serverless analytics service, you don’t need to worry about cluster creation and initialization, all you need is an ADLA account to start your analytics.

You can get the U-SQL script from different places by using SSIS built-in functions. You can:

Edit the inline U-SQL script in ADLA Task to call table valued functions and stored procedures in your U-SQL databases.
Use the U-SQL files stored in ADLS or Azure Blob Storage by leveraging Azure Data Lake Store File System Task and Azure Blob Download Task.
Use the U-SQL files from local file directly using SSIS File Connection Manager.
Use an SSIS variable that contains the U-SQL statements. You can also use SSIS expression to generate the U-SQL statements dynamically.

Read on for more information and a link to download the pack.

Comments closed

Tracking Latency To Azure With PsPing

Published 2018-07-24 by Kevin Feasel

Arun Sirpal shows us how to use PsPing (part of the Sysinternals tool set) to determine latency between your computer and a VM in an Azure data center:

This is the tool of choice when wanting to find out latency to your Azure SQL Server. In addition to standard ICMP ping functionality, it can report the latency of connecting to TCP ports, the latency of TCP round-trip communication.

I use this to find the latency from my location to various Azure SQL Servers which are in different Azure regions. I am based in the heart of England so let’s look and compare a couple of locations (just out of curiosity). Once you have downloaded the tool you will need to CD to the directory and call the following command.

Read on to see how to use PsPing.

Comments closed

Recommendations For Storage On Azure SQL DB Managed Instances

Published 2018-07-23 by Kevin Feasel

Dimitri Furman has some thoughts on database storage architecture for Azure SQL Database Managed Instances:

MI GP uses Azure Premium Storage to store database files for all databases, except for the tempdb database. From the perspective of the database engine, this storage type is remote, i.e. it is accessed over the network, using Azure network infrastructure. To use Azure Premium Storage, MI GP takes advantage of SQL Server native capability to use database files directly in Azure Blob Storage. This means that there is not a disk or a network share that hosts database files; instead, file path is an HTTPS URL, and each database file is a page blob in Azure Blob Storage.

Since Azure Premium Storage is used, its performance characteristics, limits, and scalability goals fully apply to MI GP. The High-performance Premium Storage and managed disks for VMs documentation article includes a section describing Premium Storage disk limits. While the topic is written in the context of VMs and Azure disks, which is the most common usage scenario for Azure Premium Storage, the documented limits are also applicable to blobs. As shown in the limits table in the documentation, the size of the blob determines the maximum IOPS and throughput that can be achieved against the blob. For MI GP, this means that the size of a database file determines the maximum IOPS and throughput that is achievable against the file.

The disk/blob size shown in the limits table is the maximum size for which the corresponding limit applies. For example, a blob that is > 64 GB and <= 128 GB (equivalent to a P10 disk) can achieve up to 500 IOPS and up to 100 MB/second throughput.

Read the whole thing if you’re looking at Managed Instances, but there are some tips for SQL Server in Azure IaaS.

Comments closed

Creating SQL Server Images In Azure Container Registry

Published 2018-07-18 by Kevin Feasel

Andrew Pruski shows us how to save Docker container images to the Azure Container Registry using Powershell:

Awesome! Our custom image is in our ACR!

But has it worked? Has it really? Oh ye of little faith…

I guess the only way to find out is to run a container! So let’s run a Azure Container Instance from our new image.

Spoilers: it worked.

Comments closed

New Features In Public Preview On Azure SQL Database

Published 2018-07-17 by Kevin Feasel

Microsoft has a round of announcements for public previews on Azure SQL Database. First up is Kevin Farlee announcing approximate count distinct:

The new APPROX_COUNT_DISTINCT aggregate function returns the approximate number of unique non-null values in a group.

This function is designed for use in big data scenarios and is optimized for the following conditions:

Access of data sets that are millions of rows or higher AND

Aggregation of a column or columns that have a large number of distinct values

Assuming these conditions, the accuracy will be within 2% of the precise result for a majority of workloads.

I’m liking this change. Sometimes I simply need an approximate number but I want it fast.

Shreya Verma announces MATCH support in the MERGE operator:

We will be further expanding the graph database capabilities with several new features. In this blog we will discuss one of these features that is now available for public preview in Azure SQL Database, MATCH support in MERGE DML for graph tables.

The MERGE statement performs insert, update, or delete operations on a target table based on the results of a join with a source table. For example, you can synchronize two tables by inserting, updating, or deleting rows in a target table based on differences between the target table and the source table. Using MATCH predicates in a MERGE statement is now supported on Azure SQL Database. That is, it is now possible to merge your current graph data (node or edge tables) with new data using the MATCH predicates to specify graph relationships in a single statement, instead of separate INSERT/UPDATE/DELETE statements.

I’ll use that approximately the day they fix all of the bugs with the MERGE operator.

Joe Sack announces row mode memory grant feedback:

In Azure SQL Database, we are further expanding query processing capabilities with several new features under the Intelligent Query Processing (QP) feature family. In this blog post we’ll discuss one of these Intelligent QP features that is now available in public preview, row mode memory grant feedback. Row mode memory grant feedback expands on the memory grant feedback feature by adjusting memory grant sizes for both batch and row mode operators.

Key feature benefits:

Reduce wasted memory. For an excessive memory grant condition, if the granted memory is more than two times the size of the actual used memory, memory grant feedback will recalculate the memory grant. Consecutive executions will then request less memory.
Decrease spills to disk. For an insufficiently sized memory grant that results in a spill to disk, memory grant feedback will trigger a recalculation of the memory grant. Consecutive executions will then request more memory.

This was big for batch mode operators, and I’m happy to see it move to row mode operators as well.

Finally, Joe also announces table variable deferred compilation:

In Azure SQL Database, we will be further expanding query processing capabilities with several new features under the Intelligent Query Processing (QP) feature family. In this blog post we’ll discuss one of these Intelligent QP features that is now available in public preview in Azure SQL Database, table variable deferred compilation.

Table variable deferred compilation improves plan quality and overall performance for queries referencing table variables. During optimization and initial compilation, this feature will propagate cardinality estimates that are based on actual table variable row counts. This accurate row count information will be used for optimizing downstream plan operations.

This one has the potential to be a pretty big performance improvement as well.

Comments closed

Using Azure Blob Storage Archive Tier For Archival Data

Published 2018-07-16 by Kevin Feasel

Bob Pusateri shows us how to configure Azure Blob Storage Archive Tier:

Two of the products I use extensively for this purpose are Amazon Glacier and, more recently, Microsoft Azure Blob Storage Archive Tier. As happy as I’ve been with Amazon Glacier since its introduction in 2012, I always hoped Microsoft would offer a similar service. My wish came true in Fall of 2017 when an archive tier of Azure Blob Storage was announced. Rather than branding this capability as a new product, Microsoft decided to present it as a new tier of Azure Blob Storage, alongside the existing hot and cool storage tiers.

A noticeable difference from the hot and cool storage tiers is that the archive storage tier is only available on a per-blob basis. While a storage account can be configured to have all blobs placed in either the hot or cool tier by default once they are uploaded, the archive tier is not an option. Once a blob is uploaded, it must explicitly be moved into the archive tier. If one is using the Azure Portal to do this, there’s several clicks involved per blob. The free Azure Storage Explorer client is no better. While I found several third party tools that can upload files to the archive tier, none were free. At this point, I decided to write my own method using Powershell, which I am happy to share below.

Read on for the script. A good use for Azure Blob Storage Archive Tier would be storing old database backups which you have to keep around for compliance purposes but rarely use.

Comments closed

Using Azure Logic Apps For Database Tasks

Published 2018-07-10 by Kevin Feasel

Arun Sirpal shows off a technique he has developed to run maintenance jobs against Azure SQL Database databases:

I have been using Azure Logic apps recently to build some workflows to gather data from external sources ultimately inserting it into a database for reporting, I then got thinking, how can this be useful for “DBA” based tasks? Let’s take a step back for a minute, what are logic apps? It is a technology that helps integrate apps, data, systems, and services across enterprises. Key parts of a logic app solution are connectors, triggers and actions.

I decided that I wanted to execute a stored procedure every 6 hours to capture wait statistics for my Azure SQL Database and log the information in a table.

This is what my workflow looks like.

There are a few alternatives available, so it’s nice to see an example of one of them.

Comments closed

Auto-Scaling Amazon ElasticMapReduce

Published 2018-07-03 by Kevin Feasel

Brandon Schller gives us some tips on sizing and scaling ElasticMapReduce:

EMR scaling is more complex than simply adding or removing nodes from the cluster. One common misconception is that scaling in Amazon EMR works exactly like Amazon EC2 scaling. With EC2 scaling, you can add/remove nodes almost instantly and without worry, but EMR has more complexity to it, especially when scaling a cluster down. This is because important data or jobs could be running on your nodes.

To prevent data loss, Amazon EMR scaling ensures that your node has no running Apache Hadoop tasks or unique data that could be lost before removing your node. It is worth considering this decommissioning delay when resizing your EMR cluster. By understanding and accounting for how this process works, you can avoid issues that have plagued others, such as slow cluster resizes and inefficient automatic scaling policies.

If you’re using EMR today or think you might use it in the future, you should read this.

Comments closed

Azure Data Lake Store: Now On Blob Storage

Published 2018-07-03 by Kevin Feasel

James Serra announces Azure Data Lake Store Gen2:

Big news! The next generation of Azure Data Lake Store (ADLS) has arrived. See the official announcement.

In short, ADLS Gen2 is the combination of the current ADLS (now called Gen1) and Blob storage. Gen2 is built on Blob storage. By GA, ADLS Gen2 will have all the features of both, which means it will have features such as limitless storage capacity, support all Blob tiers (Hot, Cool, and Archive), the new lifecycle management feature, Azure Active Directory integration, hierarchical file system, and read-access geo-redundant storage.

A Gen2 capability is what is called “multi-modal” which means customers can use either Blob object store APIs or the new Gen2 file system APIs. The key here is that both blob and file system semantics are now supported over the same data.

One very interesting thing to me is that Gen2 pricing is half of Gen1.

Comments closed

Testing Network Speed Between Azure Regions

Published 2018-07-03 by Kevin Feasel

Dave Bermingham has some quick inter-region tests for Azure network performance:

This is the question I asked myself today and of course I couldn’t find this documented anywhere. I’m assuming there is no guarantee and it probably depends on current utilization, etc. If I’m wrong, someone please point me to the documentation that states the available speed. I primarily looked here and here.

So I set up two Windows 2016 D4s v3 instances, one in Central US and one in East US 2, which are paired regions.

If you don’t know what peering is, it essentially lets you to easily connect two different Azure virtual networks. Peering is very easy to setup, just make sure you configure it from both Virtual Networks, I made that mistake at first. Once it is configured properly it will look something like this.

Read on for Dave’s results.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31