Press "Enter" to skip to content

Category: Cloud

Calling Cognitive Services With R

David Smith has written a go-to guide for connecting to Azure Cognitive Services using R:

There’s no official R package (yet!) for calling Cognitive Services APIs. But since every Cognitive Service API is just a standard REST API, we can use the httr package to call the API. Input and output is standard JSON, which we can create and extract using the jsonlite package.

(There’s also an independent R interface to the text APIs. And there are already Python SDKs for many of the services, including the Face API.)

This is also useful for other REST APIs for times when there isn’t already a pre-built package to do most of the translation work for you.

Comments closed

Processing Azure Analysis Services

Bill Anton shows how to process an Azure Analysis Services tabular model:

This post contains a list of various methods that can be used to process (i.e. load data into) an Azure AS tabular model. As you will see – not much has changed from the regular on-premise version (which is a very good thing as it softens the learning curve).

Read on if you’re looking at putting an Analysis Services model into Azure.

Comments closed

Sparklyr On HDInsight

Ali Zaidi has a walkthrough on using sparklyr on HDInsight:

The majority of Spark is written in Scala (~80% of Spark core), which is a functional programming language. Functional programming languages emphasize functional purity (the output only depends on the inputs) and strive to avoid side-effects. One important component of most functional programming languages is their lazy evaluation. While it might seem odd that we would appreciate laziness from our computing tools, lazy evaluation is an effective way of ensuring computations are evaluated in the most efficient manner possible.

Lazy evaluation allows Spark SQL to highly optimize the queries. When a user submits a query to Spark SQL, Spark composes the components of the SQL query into a logical plan. The logical plan is basically a recipe Spark SQL creates in order to evaluate the desired query. Spark SQL then submits the logical plan to its highly optimized engine called Catalyst, which optimizes this plan into a physical plan of action that is executed inside Spark computation engine (a series of coordinating JVMs).

Read on for more description and code.

Comments closed

Elastic Database Jobs

Mark Vaillancourt looks at Elastic Database Jobs in Azure:

The new Elastic Database Jobs are designed to echo well the functionality the folks working with SQL Server are accustomed to on-prem with SQL Agent. But it’s even better than that. There are many features that are just baked in that you no longer have to worry about. I’ve presented on the new Elastic Jobs as part of a larger presentation on the overall Elastic tools associated with Azure SQL Database a handful of times. That presentation is called Azure SQL Database Elastic Boogie and references Marcia Griffith’s hit song Electric Boogie (The Electric Slide). Yeah. I know. That will explain the use of the word boogie all over the place.

Even with it just being a very new private preview, my experience has been a great one. Huge kudos to Debra and her team on that.

This sounds pretty good.  I really like the dynamic resolution portion and wish that on-prem SQL Agent jobs could do the same out of the box.

Comments closed

TMSCHEMA DMVs

Meagan Longoria wants Azure Analysis Services documentation:

It would be great to get the DMVs documented similar to the MDSCHEMA DMVs as they are quite useful for tasks like documenting your tabular model.  Since the TMSCHEMA DMVs work in Azure Analysis Services as well, I have logged this request on the Azure AS User Voice for that. Please lend me a vote so we can make this information more easily available.

Please vote on this.

Comments closed

Free Trial Of Azure SQL Data Warehouse

James Serra notes that there is a free one-month trial of Azure SQL Data Warehouse:

You can use this one month free trial to do POCs and try out SQL DW up to 200 DWU and 2TB of data.  You must sign up by December 31st 2016.  Please note that once the one month free trial is over, you will start getting billed at general availability pricing rates.  For more information on the free trial, and to sign up, go here.

This is great because you can quickly run out of credits otherwise.

Comments closed

Test Connection With HDInsight

I have a post trying to test a connection using HDInsight:

WebHCat is a web-based REST API for HCatalog, a management layer for dealing with files in HDFS.  If you’re looking for configuration settings for WebHCat, you’ll want generally to look for “templeton” in config files, as Templeton was the project name before WebHCat.  In Ambari, you can go to the Hive configs and look at webhcat-site.xml for configuration settings.  For WebHCat, the default port in HDInsight is 30111, which you should find in the templeton.port configuration setting.

I don’t like the fact that WebHDFS is blocked, but at least WebHCat is functional.

Comments closed

Spark Clusters On Spot Pricing

Sameer Farooqui explains spot pricing with respect to AWS servers:

The idea behind Spot instances is to allow you to bid on spare Amazon EC2 compute capacity. You choose the max price you’re willing to pay per EC2 instance hour. If your bid meets or exceeds the Spot market price, you win the Spot instances. However, unlike traditional bidding, when your Spot instances start running, you pay the live Spot market price (not your bid amount). Spot prices fluctuate based on the supply and demand of available EC2 compute capacity and are specific to different regions and availability zones.

So, although you may have bid 0.55 cents per hour for a r3.2xlarge instance, you’ll end up paying only 0.10 cents an hour if that’s what the going rate is for the region and availability zone.

Databricks uses spot pricing for Community Edition clusters to control costs.  Click through for a very interesting discussion of spot pricing and how they take advantage of it.

Comments closed

Capturing SSAS Query Activity

Bill Anton explains why and how he captures query activity by user in SSAS:

In most environments, it is trivial to obtain the name of the user who ran each query… all you have to do was capture the [QueryEnd] event in a profiler/xevent trace and pull the information from the [NTUserName] field. However, in environments involving Power BI and the Enterprise On-Premise Data Gateway, there’s a bit more to it.

The main issue is how authentication is handled in this type of architecture. When working with Power BI reports connected to an on-premise data source via the On-Premise Data Gateway, the account of the user running the report is passed as the “EffectiveUsername”. The implication here is that the value shown in the [NTUserName] field of the xevent/profiler trace is going to be the Data Gateway account – NOT the account of the user who actually generated the activity.

Read on for the full answer.

Comments closed

Developing In The Cloud

Richie Rump has some nice pointers about developing for Azure or AWS:

Since we’re a bunch of data freaks, we wanted to make sure that our data and files are properly backed up. I set out to create a script that will backup DynamoDB to a file and copy the data in S3 to Azure. The reasoning for saving our backups into a different cloud provider is pretty straightforward. First, we wanted to keep the data in a separate cloud account from the application. We didn’t make the same mistakes that Code Spaces did. Secondly, I wanted to kick the tires of Azure a bit. Heck, why not?

I figure this script would take me a day to write and a morning to deploy. In the end it took four days to write and deploy. So here are some lessons that I learned the hard way from trying to bang out this backup code.

This is a must-read if you’re starting to look at using cloud providers for services.

Comments closed