Press "Enter" to skip to content

Category: Cloud

SKLearn To Azure ML

David Crook shows how to build a model using Python’s SciKit library and then operationalize it in Azure ML:

Why Model Outside Azure ML?

Sometimes you run into things like various limitations, speed, data size or perhaps you just iterate better on your own workstation.  I find myself significantly faster on my workstation or in a jupyter notebook that lives on a big ol’ server doing my experiments.  Modelling outside Azure ML allows me to use the full capabilities of whatever infrastructure and framework I want for training.

So Why Operationalize with Azure ML?

AzureML has several benefits such as auto-scale, token generation, high speed python execution modules, api versioning, sharing, tight PaaS integration with things like Stream Analytics among many other things.  This really does make life easier for me.  Sure I can deploy a flask app via docker somewhere, but then, I need to worry about things like load balancing, and then security and I really just don’t want to do that.  I want to build a model, deploy it, and move to the next one.  My value is A.I. not web management, so the more time I spend delivering my value, the more impactful I can be.

Read the whole thing.

Comments closed

Cortana Intelligence Solutions

James Serra gives an introductory walkthrough to Cortana Intelligence Solutions:

Cortana Intelligence Solutions is a new tool just released in public preview that enables users to rapidly discover, easily provision, quickly experiment with, and jumpstart production grade analytical solutions using the Cortana Intelligence Suite (CIS).  It does so using preconfigured solutions, reference architectures and design patterns (I’ll just call all these solutions “patterns” for short).  At the heart of each Cortana Intelligence Solution pattern is one or more ARM Templates which describe the Azure resources to be provisioned in the user’s Azure subscription.  Cortana Intelligence Solution patterns can be complex with multiple ARM templates, interspersed with custom tasks (Web Jobs) and/or manual steps (such as Power BI authorization in Stream Analytics job outputs).

So instead of having to manually go to the Azure web portal and provision many sources, these patterns will do it for you automatically.  Think of a pattern as a way to accelerate the process of building an end-to-end demo on top of CIS.  A deployed solution will provision your subscription with necessary CIS components (i.e. Event Hub, Stream Analytics, HDInsight, Data Factory, Machine Learning, etc.) and build the relationships between them.

James also walks through an entire solution, so check it out.

Comments closed

Using Xgboost In Azure ML Studio

Koos van Strien wants to use the xgboost model in Azure ML Studio:

Because the high-level path of bringing trained R models from the local R environment towards the cloud Azure ML is almost identical to the Python one I showed two weeks ago, I use the same four steps to guide you through the process:

  1. Export the trained model

  2. Zip the exported files

  3. Upload to the Azure ML environment

  4. Embed in your Azure ML solution

Read the whole thing.

Comments closed

SQL Server Configuration Section On Azure VM

Jack Li diagnoses an issue in which the SQL Server Configuration section of an Azure Virtual Machine only appeared under certain circumstances:

If you created an SQL Server VM via azure portal, there will be a section called “SQL Server Configuration” which was introduced via blog “Introducing a simplified configuration experience for SQL Server in Azure Virtual Machines”. Here is a screenshot of that setting.  It allows you to configure various things like auto backup, patching or storage etc.

I got a customer who created a SQL VM via powershell.  But that VM doesn’t have the section “SQL Server Configuration”.   Using his powershell script, I was able to reproduce the behavior.  When I created via portal UI, I got the “SQL Server Configuration”.

Read on for the solution.

Comments closed

SSRS Express And Azure Limitations

William Assaf points out that SQL Server Reporting Services Express Edition cannot connect to Azure SQL Database:

Express editions of SQL Server Reporting Service, from SQL 2016 on down, cannot connect to Azure SQL Databases. Turns out, getting something for free does have some significant limitations.

For example, you’ll see an error message “The Report Server has encountered a configuration error” on a data source page, when creating a new SSRS data source in the Report Manager website. What you may have not noticed on this page was the possible values in the Data Source Type drop down list.

This is an important limitation if you were thinking of living on the free tier of SSRS.

Comments closed

XGBoost

Koos van Strien moves from Python to R to run an xgboost algorithm:

Note that the parameters of xgboost used here fall in three categories:

  • General parameters

    • nthread (number of threads used, here 8 = the number of cores in my laptop)
  • Booster parameters

    • max.depth (of tree)
    • eta
  • Learning task parameters

    • objective: type of learning task (softmax for multiclass classification)
    • num_class: needed for the “softmax” algorithm: how many classes to predict?
  • Command Line Parameters

    • nround: number of rounds for boosting

Read the whole thing.

Comments closed

Azure SQL Data Warehouse Setup

Arun Sirpal configures a new instance of Azure SQL Data Warehouse:

The information shown here is the DSQL (Distributed SQL) plan – When you send a SQL query to SQL Data Warehouse, the Control node processes a query and converts the code to DSQL then the Control node sends the command to run in each of the compute nodes.

The returned query plan depicts sequential SQL statements; when the query runs it may involve parallelized operations, so some of the sequential statements shown may run at the same time. More information can be found at the following URL https://msdn.microsoft.com/en-us/library/mt631615.aspx.

Arun also looks at running a simple Power BI report off of Azure SQL Data Warehouse; click through for that.

Comments closed

Azure SQL Data Warehouse Architecture

Warner Chaves looks at system views in Azure SQL Data Warehouse:

Unlike the sys.dm_exec_requests view in SQL Server, the sys.dm_pdw_exec_requests view actually keeps up to 10000 records with the information of a request even after it has executed. This capability is very useful as you can track specific query executions as long as their records are still among the 10000 kept by the view. As time passes the oldest records are phased out in favor of more recent ones.

This is an interesting look at some of the differences between Azure SQL Data Warehouse and a “normal” SQL Server installation.  Good reading.

Comments closed

Kinesis Analytics

Ryan Nienhuis shows how to implement Amazon Kinesis Analytics:

As I covered in the first post, streaming data is continuously generated; therefore, you need to specify bounds when processing data to make your result set deterministic. Some SQL statements operate on individual rows and have natural bounds, such as a continuous filter that evaluates each row based upon a defined SQL WHERE clause. However, SQL statements that process data across rows need to have set bounds, such as calculating the average of particular column. The mechanism that provides these bounds is a window.

Windows are important because they define the bounds for which you want your query to operate. The starting bound is usually the current row that Amazon Kinesis Analytics is processing, and the window defines the ending bound.

Windows are required with any query that works across rows, because the in-application stream is unbounded and windows provide a mechanism to bind the result set and make the query deterministic. Analytics supports three types of windows: tumbling, sliding, and custom.

The concepts here are very similar to Azure’s Stream Analytics.

Comments closed

Improving HBase Cluster Restart Time

Nitin Verma explains how to re-create an HBase cluster a bit faster:

When flush ‘table’ operation is triggered, all the regions belonging to that table will flush independently. Once the HFile corresponding to a region is flushed, it records the max sequence id in metadata and notifies the WAL corresponding to the regionserver. WAL maintains a mapping table for regions and their corresponding flushed sequence id’s. When the HBase cluster restarts, the hMaster will distribute flushed sequence id’s per region to the recovery threads splitting the WAL, so that they can skip the edits which have already been persisted in HFiles.

This is particularly important for clusters which frequently spin up and down, a feature of Platform-as-a-Service solutions like HDInsight.

Comments closed