Category: Cloud

Azure Data Lake Store: Now On Blob Storage

Published 2018-07-03 by Kevin Feasel

James Serra announces Azure Data Lake Store Gen2:

Big news! The next generation of Azure Data Lake Store (ADLS) has arrived. See the official announcement.

In short, ADLS Gen2 is the combination of the current ADLS (now called Gen1) and Blob storage. Gen2 is built on Blob storage. By GA, ADLS Gen2 will have all the features of both, which means it will have features such as limitless storage capacity, support all Blob tiers (Hot, Cool, and Archive), the new lifecycle management feature, Azure Active Directory integration, hierarchical file system, and read-access geo-redundant storage.

A Gen2 capability is what is called “multi-modal” which means customers can use either Blob object store APIs or the new Gen2 file system APIs. The key here is that both blob and file system semantics are now supported over the same data.

One very interesting thing to me is that Gen2 pricing is half of Gen1.

Comments closed

Testing Network Speed Between Azure Regions

Published 2018-07-03 by Kevin Feasel

Dave Bermingham has some quick inter-region tests for Azure network performance:

This is the question I asked myself today and of course I couldn’t find this documented anywhere. I’m assuming there is no guarantee and it probably depends on current utilization, etc. If I’m wrong, someone please point me to the documentation that states the available speed. I primarily looked here and here.

So I set up two Windows 2016 D4s v3 instances, one in Central US and one in East US 2, which are paired regions.

If you don’t know what peering is, it essentially lets you to easily connect two different Azure virtual networks. Peering is very easy to setup, just make sure you configure it from both Virtual Networks, I made that mistake at first. Once it is configured properly it will look something like this.

Read on for Dave’s results.

Comments closed

Backing Up Azure Data Lake Store Data

Published 2018-06-29 by Kevin Feasel

Hugo Almeida has some hints for backing up Azure Data Lake Store data using Azure Data Factory:

Our Hadoop HDP IaaS cluster on Azure uses Azure Data Lake Store (ADLS) for data repository and accesses it through an applicational user created on Azure Active Directory (AAD). Check this tutorial if you want to connect your own Hadoop to ADLS.

Our ADLS is getting bigger and we’re working on a backup strategy for it. ADLS provides locally-redundant storage (LRS), however, this does not prevent our application from corrupting data or accidentally deleting it. Since Microsoft hasn’t published a new version of ADLS with a clone feature we had to find a way to backup all the data stored in our data lake.

We’re going to show you How to do a full ADLS backup with Azure Data Factory (ADF). ADF does not preserve permissions. However, our Hadoop client can only access the AzureDataLakeStoreFilesystem (adl) through hive with a “hive” user and we can generate these permissions before the backup.

Read the whole thing if you’re thinking of using Azure Data Lake Store.

Comments closed

Alerting On Azure Data Lake Store Data Usage

Published 2018-06-29 by Kevin Feasel

Jose Lara shows off an interesting feature in Azure Data Lake Store:

The massive scale and capabilities of Azure Data Lake Store are regularly used by companies for big data storage. As the number of files, file types, and folders grow, things get harder to manage and staying compliant becomes a greater challenge for companies. Regulations such as GDPR (General Data Protection Regulation) have heightened requirements for control and supervision of files that contain sensitive data.

In this blog post, I’ll show you how to set up alerts in your Azure Data Lake Store to make managing your data easier. We will create a log analytics query and an alert that monitors a specific path and file type and sends a notification whenever the path or file is created, accessed, modified, or deleted.

Auditing access has historically been tricky, so it’s nice that they were able to get that in.

Comments closed

Scaling Azure Analysis Services

Published 2018-06-28 by Kevin Feasel

Chris Seferlis helps us make the decision between scaling up or out with Azure Analysis Services:

Some of you may not know when or how to scale up your queries or scale out your processing. Today I’d like to help with understanding when and how using Azure Analysis Services. First, you need to decide which tier you should be using. You can do that by looking at the QPUs (Query Processing Units) of each tier on Azure. Here’s a quick breakdown:

Developer Tier – gives you up to 20 QPUs
Basic Tier – is a mid-scale tier, not meant for heavy loads
Standard Tier (currently the highest available) – allows you more capability and flexibility

Read on for some pointers.

Comments closed

Clustered Columnstore Index Online Rebuild

Published 2018-06-25 by Kevin Feasel

Niko Neugebauer looks at a feature which will pop up in SQL Server vNext:

The current state of the Clustered Columnstore Index ONLINE rebuild points to be an unfinished version, which will definitely get vastly improved before being released & supported in SQL Server. I have seen a couple of deadlocks and canceled transactions and so I decided that this blog post will get updated as soon as there will be an official announcement of this feature.
If you are still looking to start working on this feature, then I would suggest trying it on smaller tables. Like really, really small ones.
Oh, and for online rebuild operation focus on using partition rebuild – you are using the partitioning, right ? 🙂

Niko gave this a try in Azure SQL Database, as there is no publicly available version of SQL Server which supports this. I’ve been waiting for this feature for 3 years now, so I’ll be happy to see it in production.

Comments closed

Column-Level Security In Azure SQL Data Warehouse

Published 2018-06-25 by Kevin Feasel

Kavitha Jonnakuti announces a new feature for Azure SQL Data Warehouse:

Access to the table columns can be controlled based on the user’s execution context or their group membership with the standard GRANT T-SQL statement. To secure your data, you simply define a security policy via the GRANT statement to your table columns. For example, if you would like to limit access to PII data in your customers table, you can simply GRANT SELECT permissions on specific columns to the ContractEmp role:
GRANT SELECT ON dbo.Customers (CustomerId, FirstName, LastName) TO ContractEmp;
This capability is available now in all Azure regions with no additional charge.

This has been in regular SQL Server for a long time, so it’s good to see it make its way into Azure SQL Data Warehouse, and in a manner which doesn’t involve creating user-defined functions for predicates like Row-Level Security.

Comments closed

Thoughts On Spending Money In The Cloud

Published 2018-06-25 by Kevin Feasel

Andy Leonard has a few thoughts on spending money once you’ve migrated to the cloud:

In a previous consulting life, a customer contacted us and asked for an evaluation of their architecture. They were attempting to scale the business and encountering… obstacles. A team looked over their software and database designs and recommended a rewrite of their custom code. We supplied a proposal (including an expensive estimate) to deliver the re-architecture, redesign, and rewriting of their code.

They felt the price was too high.

We understood, so we countered with a proposal to coach their team to deliver the same result. This would cost less, the outcome would be the same, and their team would grok the solution (because their team would build the solution). The catch? It would take longer.

They felt this solution would take too long.

Andy has some great thoughts on the subject. One area where I’d push further is to say that the best way to take advantage of cloud services is not the best way to take advantage of on-prem services, so even if you have a well-architected on-prem solution, it might not be ideal for running in AWS or Azure.

Comments closed

Executing SSIS From Azure Data Factory

Published 2018-06-22 by Kevin Feasel

Andy Leonard shows us how to execute an SSIS package from Azure Data Factory:

The good people who work on Azure Data Factory recently added an Execute SSIS Package activity. It’s pretty cool. Let’s tinker with it some, shall we?

First, you will need to create an Azure Data Factory SSIS Integration Runtime. If you don’t know how, that’s ok – I’ve written a post titled Lift and Shift SSIS Part 0: Creating the ADF Integration Runtime that describes one way to set up ADFIR.

Read on for an example.

Comments closed

Azure Without ARM

Published 2018-06-21 by Kevin Feasel

Ed Elliott gives us a few ways of deploying Azure resources without using ARM templates:

So, what are our options?

Create/Edit/Delete ourselves using Powershell/.Net/Python/Go/Java/Some Other SDK
Process something else (YAML?) into JSON
Generate the ARM using c#/Powershell/something else
3rd party tools, (Terraform is the big daddy) / others include Sparkle Formation

To be honest, I’d probably just stick with ARM templates.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31