Category: Cloud

Assigning Tags In Azure

Published 2017-07-20 by Kevin Feasel

Melissa Coates shows how to assign tags to resources in Azure using Powershell:

You can assign tags for resource groups, as well as individual resources which support Azure Resource Manager. The individual resources do not automatically inherit tags from the resource group parent. A maximum of 15 key/value pairs can be assigned (though you could store concatenated values or embedded JSON in a single tag value as a workaround). You may want to just assign tags at just the resource group level, and use custom queries to “inherit” at the resource level. Alternatively, you may want to assign tags to the individual resources directly particularly if you want to see them clearly on the standard “download usage” report of billing.

Since the key/value pairs are just free-form text, watch out for uniformity issues. To improve consistency, you can utilize policies to require tags and/or apply defaults if you’d like (for example, you might want to enforce a “Created By” tag). Tags can be set in the ARM template when you initially deploy a resource (which is best so that no billing occurs without proper tagging), or afterwards to existing resources via the portal, PowerShell, or CLI.

Melissa also shows how to query those tags using Powershell.

Comments closed

Understanding Azure Billing

Published 2017-07-20 by Kevin Feasel

Denny Cherry explains some of the oddities behind Azure billing:

Some things stand out to me when looking at this bill. First is the service that is being billed. It doesn’t say how many DWs we’re being billed more (more on that later).

Something else odd is that the SQL DW shows being billed for 1220 hours. Now in theory the most number of hours for any single service you can be billed for in a single month is 744 as that’s 24 (hours) *31 (days), but here we’re being billed for 1220 hours. Now we know for a fact that this SQL DW is powered off at times, so how are we using this resource for more than 31 days in a month?

The last oddity is the rate. $1.46 an hour. This points us to the reason for all of these strange numbers. That’s the rate to run a SQL DW at DW100 for one hour.

Click through for Denny’s explanation.

Comments closed

Data Lake Tools For VS Code Updated

Published 2017-07-14 by Kevin Feasel

Jenny Jiang announces Azure Data Lake Tools for Visual Studio Code’s July update:

Local Debug enables you to debug your C# code behind, step through the code, and validate your script locally before submitting to ADLA.

Use command ADL: Start Local Run Service to start local run service and set a breakpoint in your code behind, then click command ADL: Local Debug to start local debug service. You can debug through the debug console and view parameter, variable, and call stack information.

Click through to see the other improvements.

Comments closed

Azure SQL Database Security Basics

Published 2017-07-13 by Kevin Feasel

Arun Sirpal explains some of the security features exposed in Azure SQL Database:

You do not really have to use all of them, this is down to you and your requirements but at least you have decisions to make. I will mention TDE (Transparent Data Encryption), I know a lot of people will opt for TDE in Azure. The big advantage of TDE in Azure over the earthed flavour is that Microsoft does a lot of the work for you, especially around the key management side of things. Also assuming your database is in a geo-replication partnership it will be protected by a different key on each of the servers. Microsoft will also rotate your certificate at least every 90 days, doing this with a local based SQL Server can be quite manual and fiddly (well I think it is).

Read the whole thing if you’re thinking of moving forward with Azure SQL Database, or if you already have a database up in Azure and haven’t checked the latest offerings yet.

Comments closed

Streaming ETL Using CDC And Event Hub

Published 2017-07-12 by Kevin Feasel

Rolf Tesmer combines Change Data Capture and Event Hubs to build a streaming ETL solution:

The solution picks up the SQL data changes from the CDC Change Tracking system tables, creates JSON messages from the change rows, and then posts the message to an Azure Event Hub. Once landed in the Event Hub an Azure Stream Analytics (ASA) Job distributes the changes into the multiple outputs.

What I found pretty cool was that I could transmit SQL delta changes from source to target in as little as 5 seconds end to end!

There are a bunch of steps, but the end result is worth it.

Comments closed

Loan Chargeoff Templates

Published 2017-07-05 by Kevin Feasel

Ajay Jagannathan announces a couple new Cortana Intelligence Solutions Gallery templates:

For more information, read this blog: End to End Loan ChargeOff Prediction Built Using Azure HDInsight Spark Clusters and SQL Server 2016 R Service

We have published two solution templates deployable using two technology stacks for the above chargeoff scenario:-

Loan Chargeoff Prediction using SQL Server 2016 R Services – Using DSVM with SQL Server 2016 and Microsoft ML, this solution template walks through how to create and clean up a set of simulated data, use 5 different models to train, select the best performant model, perform scoring using the model and save the prediction results back to SQL Server. A PowerBI report connects to the prediction table and show interactive reports with the user on the chargeoff prediction.
Loan Chargeoff Prediction using HDInsight Spark Clusters – This solution demonstrates how to develop machine learning models for predicting loan chargeoff (including data processing, feature engineering, training and evaluating models), deploy the models as a web service (on the edge node) and consume the web service remotely with Microsoft R Server on Azure HDInsight Spark clusters. The final predictions is saved to a Hive table which could be visualized in Power BI.

These tend to be nice because they show you how the different pieces of the Azure stack tie together.

Comments closed

S3Guard

Published 2017-06-30 by Kevin Feasel

Mingliang Liu and Rajesh Balamohan explain why you shouldn’t use S3 as your primary Hadoop data store, as well as a tool which helps mitigate those problems:

Some of the real world use cases which can be impacted due to the S3 eventual consistency model are:

Listing Files. Newly created files might not be visible for data processing. In Hive, Spark and MapReduce, this can lead to erroneous results from incomplete source data or failure to commit all intermediate results.
ETL Workflow. Systems like Oozie rely on marker files to trigger the subsequent workflows. Any delay in the visibility of these files can lead to delays in the subsequent workflows.
Existence-guarded path operations. Any action which fails if the destination path is present may see a deleted file in a listing, and so fail — even though the file has already been deleted.

Read on to see how S3Guard works and how to enable it in HDP 2.6.

Comments closed

Running H2O In R On Azure HDInsight

Published 2017-06-28 by Kevin Feasel

Daisy Deng shows how to configure HDInsight to be able to run the H2O package in R rather than Python or Scala:

We provide a few script actions for installing rsparkling on Azure HDInsight. When creating the HDInsight cluster, you can run the following script action for header node:

https://bostoncaqs.blob.core.windows.net/scriptaction/scriptaction-head.sh

And run the following action for the worker node:

https://bostoncaqs.blob.core.windows.net/scriptaction/scriptaction-worker.sh

Please consult Customize Linux-based HDInsight clusters using Script Action for more details.

Click through for the full process.

Comments closed

Bacpacing In Azure

Published 2017-06-28 by Kevin Feasel

Derik Hammer shows how to use a bacpac file to deploy an existing database to Azure SQL Database:

The recommended method for working with Azure is always PowerShell. The Azure portal and SSMS are tools there for your convenience but they do not scale well. If you have multiple databases to migrate, potentially from multiple servers, using PowerShell will be much more efficient. Scripting your Azure work makes it repeatable and works towards the Infrastructure as Code concept.

In this demonstration, the below steps will be used.

Export the bacpac file to a local directory with sqlpackage.exe.
Copy the bacpac to Azure Blob Storage with AzCopy.exe
Use the PowerShell AzureRM module and cmdlets to create an Azure SQL Database from the bacpac file.

Derik shows the point-and-click way as well as the Powershell way.

Comments closed

Polybase Design Patterns On Azure SQL DW

Published 2017-06-28 by Kevin Feasel

Simon Whiteley continues his Polybase on Azure SQL Data Warehouse series. First, he covers data loading patterns:

That’s enough about data loading for now, there’s another major use case for Polybase that we haven’t yet discussed. Many data processing solutions have a huge, unwieldy overnight batch job that performs aggregates, lookups, analytics and various other calculations.

However, it is often the case that this is not timely enough for many business requirements. This is where Polybase can help.

If we have an External Table over the newest files, this will read these new records at query time. We can write a view that combines the External Table with our batch-produced table. This will obviously go a little slower than usual, given it has to read data from flat files each time, however the results returned will be up to date.

Simon then covers the Create Table As Select statement:

In order to utilise SQLDW effectively, we write SQL for our transformations, rather than relying on external tools such as SSIS. This ensures the work is being done by our compute nodes and, therefore, can be scaled up to increase performance.

General best practice, therefore, would be write stored procedures for each of the data movements we want to occur. This allows us to add in auditing, logging etc. But what we’re interested in here is the core data movement itself.

Writing a traditional INSERT statement isn’t the fastest way to get data into a table. There is a special syntax which creates a new table and inserts into it, that is automatically configured for optimal bulk loading, this is the CTAS, or “Create Table as Select” statement.

This is a pair of interesting posts from Simon.

Comments closed