Press "Enter" to skip to content

Category: Cloud

Spark Usage Scenarios

Rimma Nehme has several usage scenarios for Spark on Azure:

For data scientists, we provide out-of-the-box integration with Jupyter (iPython), the most popular open source notebook in the world. Unlike other managed Spark offerings that might require you to install your own notebooks, we worked with the Jupyter OSS community to enhance the kernel to allow Spark execution through a REST endpoint.

We co-led “Project Livy” with Cloudera and other organizations to create an open source Apache licensed REST web service that makes Spark a more robust back-end for running interactive notebooks.  As a result, Jupyter notebooks are now accessible within HDInsight out-of-the-box. In this scenario, we can use all of the services in Azure mentioned above with Spark with a full notebook experience to author compelling narratives and create data science collaborative spaces. Jupyter is a multi-lingual REPL on steroids. Jupyter notebook provides a collection of tools for scientific computing using powerful interactive shells that combine code execution with the creation of a live computational document. These notebook files can contain arbitrary text, mathematical formulas, input code, results, graphics, videos and any other kind of media that a modern web browser is capable of displaying. So, whether you’re absolutely new to R or Python or SQL or do some serious parallel/technical computing, the Jupyter Notebook in Azure is a great choice.

If you could only learn one new thing in 2016, Spark probably should be that thing.  Also, I probably should agitate a bit more about wanting Spark support within Polybase…

Comments closed


Ginger Grant has a quick intro on U-SQL:

In my previous series on Stream Analytics, I wrote some U-SQL. That U-SQL didn’t look much different than Ansi-SQL, which is sort of the point of porting the functionality to a different yet familiar language. Another application which heavily uses U-SQL is Azure Data Lake. Data Lake stores its data in HDInsight, but you don’t need to write hive to query the data, as U-SQL will do it. Like Hive, U-SQL can be used to create a schema on top of some data, and then query it.

For example, to write a query on this csv file stored in a Data Lake, I would need to create the data definition for the data, then I could easily write a statement to query it.

I’m interested in seeing how much adoption we see in this language.

Comments closed

Azure SQL Data Warehouse Plans

Grant Fritchey shows how to build an execution plan for an Azure SQL Data Warehouse query:

So now we just save this as a .sqlplan file and open it in SSMS, right?


See, that’s not a regular execution plan, at all. Instead, it’s a D-SQL plan. It’s not the same as our old execution plans. You can’t open it as a graphical plan (and no, not even in that very popular 3rd party tool, I tried). You will have to learn how to read these plans differently because, well, they are different.

That’s an unfortunate outcome.  Reading is hard…

Comments closed

Troubleshooting Data Factory Errors

Ginger Grant discusses Azure Data Factory errors:

Unfortunately, while developing Data Factory I became very familiar with errors. All of the errors show up at the end and provide very little insight as to what in the process failed. Here’s an example.

Database operation failed on server ‘’ with SQL Error Number ‘40197’. Error message from database execution : The service has encountered an error processing your request. Please try again. Error code 4815. A severe error occurred on the current command. The results, if any, should be discarded.

This sounds like classic Microsoft error messages:  “An error occurred.  Here is a code you can put into Google and hope desperately that someone has already figured out the answer.  Good luck!”

Comments closed

Training Data With Azure ML

Koos van Strien discusses training data sets and cross-validating results:

When choosing a train and testset, you’ll implicitly introduce a new bias: it could be that the model you just trained predicts well for this particular testset, when trained for this particular trainset. To reduce this bias, you could “cross-validate” your results.

Cross-validation (often abbreviated as just “cv”) splits the dataset into n folds. Each fold is used once as a testset, using all other folds together as a training set. So in our pizza example with 100 records, with 5 folds we will have 5 test runs:

This isn’t Azure ML-specific, and is good reading.

Comments closed

Migrating Data To SQL Server Using Data Factory

Ginger Grant moves data from Azure Blob Storage into Azure SQL Database using Data Factory:

There are instances where data resides in Azure Blob Storage and the data is needed in a SQL database. For example, if one ran a Machine Learning experiment in Data Factory, the results would be stored in Azure Blob storage, and for analysis purposes, it may make a lot more sense to move the data to SQL database. Moving data around in Data Factory, means writing JSON. In this example we will be using an Azure SQL DB, but it is not essential that the data be stored in Azure. An on-premises SQL Server could also be used, as long as a gateway was added for the connection, the other steps would be the same. There are five different Data Factory elements required to move data from an Azure blob to a database: a pipeline for the data, a data set containing the definition for the blob, a linked service for the blob, a data set containing a definition for the SQL Data, and a linked service to connect to the SQL database.

There’s a lot of JSON ahead.

Comments closed

Calculating DTU

John Sterrett gives us a measure for calculating DTUs in Azure SQL Database:

The whole query is below. Right now, let’s just focus on the secret sauce. The secret sauce is how DTU percentage gets calculated.  In a nutshell, the maximum of CPU, Data IO, Log Write Percent determine your DTU percentage.  What does this mean to you? Your max consumer limits you. So, you can be using 1% of your IO but still be slowed down because CPU could be your max consumer resource.

That’s a rather interesting finding.  I think the next step (which may be so context-dependent that it’s not possible to generalize) might be to figure out what various workloads do to the metrics and if there’s a way to predict with some reasonable accuracy the expected DTU load given an anticipated change in workload, rather than seeing the value spike and reacting to it later.

Comments closed

Releasing An Azure Page/Blob Lease

Denny Cherry has VB code to release an Azure page or blob storage lease:

Sometimes when firing up VMs or moving VMs from the page or blob store you’ll get an error that there is still a lease on the file.  To solve this you need to release the lease. But waiting won’t do the trick, as the leases don’t have an expiration date.

I found some VB.NET code online that with some tweaking (with the help of Eli Weinstock-Herman and Christiaan Baes) I was able to get to release the lease.

Click through for the code.

Comments closed

Copying On-Prem Databases To Azure SQL Database

Kenneth Fisher walks us through migrating a database to Azure SQL Database:

It turns out it’s pretty easy (even if it takes some time). So where to start? Well the first thing we need is a place to put our database. An Azure SQL Database Server. If you don’t already have one creating a new one is fairly easy.

First start at Log in and follow these steps

This is the longer, manual process.  It’s good to walk through it this way at least once before writing a Powershell script, just to see what the script is doing.

Comments closed

Azure SQL Database Alerts With Powershell

Mike Fal shows how to create Azure SQL Database alerts using Powershell:

So let’s get down to brass tacks and actually create an alert. To do this, we need some info first:

  • The Resource Group we will create the alert in.

  • An Azure location where the alert will live.

  • An Azure SQL Database server and database we are creating the alert for.

  • What metric we will monitor and what is the threshold we will be checking.

  • (optional) An email to send an alert to.

Mike follows this up with code and shows it’s not scary at all to create these alerts from within Powershell.

Comments closed