Press "Enter" to skip to content

Category: Cloud

Data Lake Archive Tier

Ust Oldfeld looks at an important part of a data lake:

The Archive access tier in blob storage was made generally available today (13th December 2017) and with it comes the final piece in the puzzle to archiving data from the data lake.

Where Hot and Cool access tiers can be applied at a storage account level, the Archive access tier can only be applied to a blob storage container. To understand why the Archive access tier can only be applied to a container, you need to understand the features of the Archive access tier. It is intended for data that has no or low SLAs for availability within an organisation and the data is stored offline (Hot and Cool access tiers are online). Therefore, it can take up to 15 hours for data to be made online and available. Brining Archive data online is a process called rehydration (fitting for the data lake). If you have lots of blob containers in a storage account, you can archive them and rehydrate them as required, rather than having to rehydrate the entire storage account.

Read on for more details, including a pattern for archiving data lake data.

Comments closed

When Data Factory Flows Don’t

Emma Stewart points out an issue that might vex newcomers to Azure Data Factory:

The data within the Data Lake store was organised into a Year and Month hierarchy for the folders, and each days transactions were stored in a file which was named after the day within the relevant month folder. The task then was to create a pipeline which copies the dataset in the Data Lake Store over to the dbo.Orders table in Azure SQL DB every day within the scheduled period (Q1 2016).

After creating all the json scripts and deploying them (with no errors), I clicked on the ‘Monitor and Manage’ tile to monitor the activities, check everything was working as it should be and monitor the progress. After waiting for at least 10 minutes, I started to get frustrated.

Click through for the fix and an explanation.

Comments closed

Using The Command Line To Migrate To Azure SQL Database

Arun Sirpal shows how to use SqlPackage.exe to migrate a database to Azure SQL Database:

I have moved many databases to Azure via different methods but I recently came across a new way. Well technically it’s not new, I should say, newly found. The migration was done via the command line which is not exactly ground breaking but it’s nice to have another option.

The idea behind this is simple. Create the bacpac via command line using sqlpackage.exe with the action as export then do an import action into Azure.

Read on for the demo.

Comments closed

Fetching U-SQL Job Input And Output Paths

Matthew Hicks shows how to retrieve information on U-SQL input and output paths using Powershell:

Each time you submit a U-SQL job, a job folder is created in your Azure Data Lake Store account. This folder contains useful debugging information about the job, including a file called the U-SQL algebra file. This is an XML file containing information about your job graph, the list of input and output files, and other key U-SQL job metadata.

We’ve just published a sample script that reads the U-SQL algebra file for a specified job and returns the input or output files. Give it a try!

Read on for more.

Comments closed

Larger Azure SQL Database Standard Tier Sizes

Tim Radney reports on a new Standard tier preview for Azure SQL Database:

Previously, the Standard tier only offered 4 levels: 15, 30, 50, and 100 DTUs, with a database size limit of 250GB, with standard disk. If you had a database that was larger than 250GB, however did not need more than 100 DTUs for CPU, memory, or I/O, you were stuck paying a Premium price just for database size. With the new changes, you can now have up to a 1TB database in the Standard tier; you just have to pay the extra storage. Currently storage is being billed at $0.085/GB during the preview. Increasing from the included size of 250GB to 1TB increases by 774GB at a cost of $65.79 per month.

The new Standard preview DTU sizes support 200, 400, 800, 1,600, and 3,000 DTU options. If you have a SQL Server database workload that is more CPU-bound than I/O, these Standard tier options have the potential to save you a lot of money; however, if your workload is I/O bound, the Premium tier is going to outperform the Standard tier.

Tim follows this up with a couple of quick demos.

Comments closed

Azure Functions Basics

Vincent-Philippe Lauzon explains the basics of Azure Functions:

In general, serverless refers to an economical model where we pay for compute resources used as opposed to “servers”.

Wait…  isn’t that what the Cloud is about?

Well, yes, on a macro-scale it is, but serverless brings it to a micro-scale.

In the cloud we can provision a VM, for example, run it for 3 hours and pay for 3 hours.  But we can’t pay for 5 seconds of compute on a VM because it won’t have time to boot.

A lot of compute services have a “server-full” model.  In Azure, for instance, a Web App comes in number of instances.  Each instance has a VM associated to it.  We do not manage that VM but we pay for its compute regardless of the number of requests it processes.

In a serverless model, we pay for micro-transactions.

This is the first part in a series and is aimed at giving a conceptual explanation.

Comments closed

Running PySpark In Visual Studio Code

Jenny Jiang shows how to run PySpark on HDInsight in VSCode:

We are excited to introduce the integration of HDInsight PySpark into Visual Studio Code (VSCode), which allows developers to easily edit Python scripts and submit PySpark statements to HDInsight clusters. For PySpark developers who value productivity of Python language, VSCode HDInsight Tools offer you a quick Python editor with simple getting started experiences, and enable you to submit PySpark statements to HDInsight clusters with interactive responses. This interactivity brings the best properties of Python and Spark to developers and empowers you to gain faster insights.

Click through to see how it’s done.

Comments closed

A Look At Azure SQL Database Price By Tier

Arun Sirpal puts together a comparison of Azure SQL Database prices (in GBP) by service tier:

Hopefully this paints a picture for you. I will have my say though. Basic tier database is something that you should NOT be using for production workloads, its quite obvious with the 2GB limit but worth reinforcing the point. Standard tier is more for your common workloads and premium is designed for high transactional volume where I/O performance is much more important to you – my hunch maybe they are utilizing SSDs? I am not sure but the premium costs are much higher.

There is another service tier called Premium RS (In preview). My understanding is that performance is similar to that of Premium HOWEVER only useful for workloads that can tolerate data loss up to 5-minutes due to service failures. I will probably not use this for production but then again it seems to be nearly half the cost of premium. Choices choices choices.

Two notes with this one.  First, these are prices as of when Arun put together the notes; they will probably fluctuate over time.  Second, there might be differences in prices by data center.  At the very least, though, this gives you an idea of the relative price spread.

Comments closed

Alerting In Azure SQL Database

Arun Sirpal shows how to set up an alert for an Azure SQL Database:

I keep things simple and like to look at certain performance based metrics but before talking about what metrics are available let’s step through an example.

For this post I want to setup an alert for CPU percentage utilised that when it is greater than 50% over the last 5 minutes I would like to know about it. First step is to navigate to your Azure SQL Database.

Click through for a screenshot-driven guided tour.

Comments closed