Press "Enter" to skip to content

Category: Cloud

Using Azure Blob Storage Archive Tier For Archival Data

Bob Pusateri shows us how to configure Azure Blob Storage Archive Tier:

Two of the products I use extensively for this purpose are Amazon Glacier and, more recently, Microsoft Azure Blob Storage Archive Tier. As happy as I’ve been with Amazon Glacier since its introduction in 2012, I always hoped Microsoft would offer a similar service. My wish came true in Fall of 2017 when an archive tier of Azure Blob Storage was announced. Rather than branding this capability as a new product, Microsoft decided to present it as a new tier of Azure Blob Storage, alongside the existing hot and cool storage tiers.

A noticeable difference from the hot and cool storage tiers is that the archive storage tier is only available on a per-blob basis. While a storage account can be configured to have all blobs placed in either the hot or cool tier by default once they are uploaded, the archive tier is not an option. Once a blob is uploaded, it must explicitly be moved into the archive tier. If one is using the Azure Portal to do this, there’s several clicks involved per blob. The free Azure Storage Explorer client is no better. While I found several third party tools that can upload files to the archive tier, none were free. At this point, I decided to write my own method using Powershell, which I am happy to share below.

Read on for the script.  A good use for Azure Blob Storage Archive Tier would be storing old database backups which you have to keep around for compliance purposes but rarely use.

Comments closed

Using Azure Logic Apps For Database Tasks

Arun Sirpal shows off a technique he has developed to run maintenance jobs against Azure SQL Database databases:

I have been using Azure Logic apps recently to build some workflows to gather data from external sources ultimately inserting it into a database for reporting, I then got thinking, how can this be useful for “DBA” based tasks? Let’s take a step back for a minute, what are logic apps? It is a technology that helps integrate apps, data, systems, and services across enterprises. Key parts of a logic app solution are connectors, triggers and actions.

I decided that I wanted to execute a stored procedure every 6 hours to capture wait statistics for my Azure SQL Database and log the information in a table.

This is what my workflow looks like.

There are a few alternatives available, so it’s nice to see an example of one of them.

Comments closed

Auto-Scaling Amazon ElasticMapReduce

Brandon Schller gives us some tips on sizing and scaling ElasticMapReduce:

EMR scaling is more complex than simply adding or removing nodes from the cluster. One common misconception is that scaling in Amazon EMR works exactly like Amazon EC2 scaling. With EC2 scaling, you can add/remove nodes almost instantly and without worry, but EMR has more complexity to it, especially when scaling a cluster down. This is because important data or jobs could be running on your nodes.

To prevent data loss, Amazon EMR scaling ensures that your node has no running Apache Hadoop tasks or unique data that could be lost before removing your node. It is worth considering this decommissioning delay when resizing your EMR cluster. By understanding and accounting for how this process works, you can avoid issues that have plagued others, such as slow cluster resizes and inefficient automatic scaling policies.

If you’re using EMR today or think you might use it in the future, you should read this.

Comments closed

Azure Data Lake Store: Now On Blob Storage

James Serra announces Azure Data Lake Store Gen2:

Big news!  The next generation of Azure Data Lake Store (ADLS) has arrived.  See the official announcement.

In short, ADLS Gen2 is the combination of the current ADLS (now called Gen1) and Blob storage.  Gen2 is built on Blob storage.  By GA, ADLS Gen2 will have all the features of both, which means it will have features such as limitless storage capacity, support all Blob tiers (Hot, Cool, and Archive), the new lifecycle management feature, Azure Active Directory integration, hierarchical file system, and read-access geo-redundant storage.

A Gen2 capability is what is called “multi-modal” which means customers can use either Blob object store APIs or the new Gen2 file system APIs.  The key here is that both blob and file system semantics are now supported over the same data.

One very interesting thing to me is that Gen2 pricing is half of Gen1.

Comments closed

Testing Network Speed Between Azure Regions

Dave Bermingham has some quick inter-region tests for Azure network performance:

This is the question I asked myself today and of course I couldn’t find this documented anywhere. I’m assuming there is no guarantee and it probably depends on current utilization, etc. If I’m wrong, someone please point me to the documentation that states the available speed. I primarily looked here and here.

So I set up two Windows 2016 D4s v3 instances, one in Central US and one in East US 2, which are paired regions.

If you don’t know what peering is, it essentially lets you to easily connect two different Azure virtual networks. Peering is very easy to setup, just make sure you configure it from both Virtual Networks, I made that mistake at first. Once it is configured properly it will look something like this.

Read on for Dave’s results.

Comments closed

Backing Up Azure Data Lake Store Data

Hugo Almeida has some hints for backing up Azure Data Lake Store data using Azure Data Factory:

Our Hadoop HDP IaaS cluster on Azure uses Azure Data Lake Store (ADLS) for data repository and accesses it through an applicational user created on Azure Active Directory (AAD). Check this tutorial if you want to connect your own Hadoop to ADLS.

Our ADLS is getting bigger and we’re working on a backup strategy for it. ADLS provides locally-redundant storage (LRS), however, this does not prevent our application from corrupting data or accidentally deleting it. Since Microsoft hasn’t published a new version of ADLS with a clone feature we had to find a way to backup all the data stored in our data lake.

We’re going to show you How to do a full ADLS backup with Azure Data Factory (ADF). ADF does not preserve permissions. However, our Hadoop client can only access the AzureDataLakeStoreFilesystem (adl) through hive with a “hive” user and we can generate these permissions before the backup.

Read the whole thing if you’re thinking of using Azure Data Lake Store.

Comments closed

Alerting On Azure Data Lake Store Data Usage

Jose Lara shows off an interesting feature in Azure Data Lake Store:

The massive scale and capabilities of Azure Data Lake Store are regularly used by companies for big data storage. As the number of files, file types, and folders grow, things get harder to manage and staying compliant becomes a greater challenge for companies. Regulations such as GDPR (General Data Protection Regulation) have heightened requirements for control and supervision of files that contain sensitive data.

In this blog post, I’ll show you how to set up alerts in your Azure Data Lake Store to make managing your data easier. We will create a log analytics query and an alert that monitors a specific path and file type and sends a notification whenever the path or file is created, accessed, modified, or deleted.

Auditing access has historically been tricky, so it’s nice that they were able to get that in.

Comments closed

Scaling Azure Analysis Services

Chris Seferlis helps us make the decision between scaling up or out with Azure Analysis Services:

Some of you may not know when or how to scale up your queries or scale out your processing. Today I’d like to help with understanding when and how using Azure Analysis Services. First, you need to decide which tier you should be using. You can do that by looking at the QPUs (Query Processing Units) of each tier on Azure. Here’s a quick breakdown:

  • Developer Tier – gives you up to 20 QPUs

  • Basic Tier – is a mid-scale tier, not meant for heavy loads

  • Standard Tier (currently the highest available) – allows you more capability and flexibility

Read on for some pointers.

Comments closed

Clustered Columnstore Index Online Rebuild

Niko Neugebauer looks at a feature which will pop up in SQL Server vNext:

The current state of the Clustered Columnstore Index ONLINE rebuild points to be an unfinished version, which will definitely get vastly improved before being released & supported in SQL Server. I have seen a couple of deadlocks and canceled transactions and so I decided that this blog post will get updated as soon as there will be an official announcement of this feature.
If you are still looking to start working on this feature, then I would suggest trying it on smaller tables. Like really, really small ones.
Oh, and for online rebuild operation focus on using partition rebuild – you are using the partitioning, right ? 🙂

Niko gave this a try in Azure SQL Database, as there is no publicly available version of SQL Server which supports this.  I’ve been waiting for this feature for 3 years now, so I’ll be happy to see it in production.

Comments closed

Column-Level Security In Azure SQL Data Warehouse

Kavitha Jonnakuti announces a new feature for Azure SQL Data Warehouse:

Access to the table columns can be controlled based on the user’s execution context or their group membership with the standard GRANT T-SQL statement. To secure your data, you simply define a security policy via the GRANT statement to your table columns. For example, if you would like to limit access to PII data in your customers table, you can simply GRANT SELECT permissions on specific columns to the ContractEmp role:

GRANT SELECT ON dbo.Customers (CustomerId, FirstName, LastName) TO ContractEmp;

This capability is available now in all Azure regions with no additional charge.

This has been in regular SQL Server for a long time, so it’s good to see it make its way into Azure SQL Data Warehouse, and in a manner which doesn’t involve creating user-defined functions for predicates like Row-Level Security.

Comments closed