Press "Enter" to skip to content

Category: Cloud

Azure SQL Database Supports R Integration

David Smith notes that Azure SQL Database now has (in preview) support for R:

Azure SQL Database, the database-as-a-service based on Microsoft SQL Server, now offers R integration. (The service is currently in preview; details on how to sign up for the preview are provided in that link.) While you’ve been able to run R in SQL Server in the cloud since the release of SQL Server 2016 by running a virtual machine, Azure SQL Database is a fully-managed instance that doesn’t require you to set up and maintain the underlying infrastructure. You just choose the size and scale of the database you want to manage, and then connect to it like any other SQL Server instance. (If you want to learn how to set up an Azure SQL database, this Microsoft Learn module is a good place to start.)

Python and Java are not yet supported, but I’d imagine that they’ll be on the way too.

Comments closed

Using Azure ML To Approve Expenses Automatically

Isabelle Van Campenhoudt walks us through a scenario of using Azure ML to find expense reports which should automatically be approved, reducing the workload for approvers:

My partner in crime Serge Luca aka Doctor Flow is the author of a nice and complex expenses approval system in Microsoft Flow .
One year ago, he asked me to add analytics to his Flow.  This year he has the interesting idea to add a machine-learning based approval in his flow and suggest me to work on it. The idea is the following: Since we have a lot of approvals in our system, can a machine learn and found some decision pattern to apply automatically to each expenses request ?
I decided to use the Microsoft Azure Machine Learning Studio. In this tool you can build experiments and use some of the most common and useful machine learning algorithms. It was amazing to see how easy it is to create and consume machine learning .

This contrasts with Ginger Grant’s nightmare scenario pretty well:  instead of trying to get the ML process to do all of the work, create a process which takes care of the really easy stuff and leave harder tasks to specialists with a deeper understanding of the rules.  That way they don’t have to spend their time on trivialities.

Comments closed

The Risk Of Shadow IT In The Cloud

Kenneth Fisher walks us through the risk of increased Shadow IT with migration to the cloud:

Shadow IT has been, well, maybe not the bane of the IT department, but certainly a pain in the neck. On the off chance you’ve never heard of shadow IT do any of these sound familiar?

  • A user asks you to restore a corrupt database on a SQL Server you’ve never heard of and isn’t in your inventory. (And 50/50 odds there’s never been a backup taken.)

  • You do a licensing true-up and dozens of new SQL Servers suddenly show up.

  • You hear from a user: “We have this mission critical Access database that suddenly isn’t working. I know you don’t support access but you’re the database person so we need you to fix it.”

It’s an interesting short essay and worth thinking about if you’re in the cloud or moving that way.

Comments closed

Working With The Databricks API Via Powershell

Gerhard Brueckl has a Powershell module for interacting with Databricks, either Azure or AWS:

As most of our deployments use PowerShell I wrote some cmdlets to easily work with the Databricks API in my scripts. These included managing clusters (create, start, stop, …), deploying content/notebooks, adding secrets, executing jobs/notebooks, etc. After some time I ended up having 20+ single scripts which was not really maintainable any more. So I packed them into a PowerShell module and also published it to the PowerShell Gallery (https://www.powershellgallery.com/packages/DatabricksPS) for everyone to use!

This looks like a pretty good module if you work with Databricks.

Comments closed

Migrating A Database To Managed Instances

Frank Gill shows how to migrate a database from on-premises to an Azure SQL Managed Instance:

If you have run through my last Managed Instance blog post, you have a Managed Instance at your disposal.  The PowerShell script for creating the network requirements also contains steps to create an Azure VM in a different subnet in the same VNet.  Unless you have a site-to-site VPN or Express Route between your on-prem environment and Azure, you will use this VM to connect to your Managed Instance.

Install Management Studio on the Azure VM.  To connect to your Managed Instance, you will need the host name for your Managed Instance.  You can find the Managed Instance host name on the resource page for your Managed Instance in the Portal.

I think this migration story is a bit easier for DBAs than the old Azure SQL Database strategy of building dacpacs.

Comments closed

Priority Queuing In Azure SQL Data Warehouse

Matt How walks us through an improvement to Azure SQL Data Warehouse:

The concept of workload management is a key factor for Azure SQL DW as there is only limited concurrency slots available and depending on the resource class, these slots can fill up pretty quickly. Once the concurrency slots are full, queries are queued until a sufficiently sized slot is opened up. Let’s recap what Resource Classes are and how they affect workload management.

A Resource Class is a pre-configured database role that determines how much resource is allocated to queries coming from users that belong to that role. For example, an ETL service account may use a “large” resource class and be allocated a generous amount of the server, however an analyst may use a “small” resource class and therefore only use up a small amount of the server with their queries. There are actually 2 types of resource class, Dynamic and Static. The Dynamic resource classes will grant a set percentage of memory to a query and actual value of this percentage will vary as the Warehouse scales up and down. The key factor is that an xLargeRc (extra-large resource class) will always take up 70% of the Server and will not allow any other queries to be run concurrently. No matter how much you scale up the Warehouse, queries run with an xLargeRc will run one at a time. Conversely, queries run with a smallrc will only be allocated 4% of the Server and therefore as a Warehouse scales up, this 4% becomes a larger amount of resource and can therefore process data quicker.

This looks like a useful addition.  Click through for a few examples of how it will work.

Comments closed

The Value Of Power BI Dataflows

Matt Allington gets to the core benefits of Power BI Dataflows:

Dataflows are:

  1. An online service provided by Microsoft as part of Power BI (software as a service, or SaaS).

  2. In effect dataflows are an online data collection and storage tool.

    • Collection:  It uses Power Query to connect to the data at the source and transform that data as needed.
      • You will need to be able to access the data either through a cloud service (such as Dynamics 365) or to your PC/Network via a gateway.
      • You can also use Power Query to write queries from scratch, such as my Power BI calendar table.
    • Storage:  Dataflows then stores that data in a table in the cloud so it can be used directly inside PowerBI.com, but more importantly (from my view) directly from Power BI Desktop.
  3. Dataflows leverage the Power Query skills you have learnt (or are learning) using other tools (like Power BI Desktop, Power Query for Excel) allowing you to reuse those same skills in this online tool.

  4. Tables that are created as a result of the dataflow are stored in an Azure Data Lake.

    • If you don’t know what that is, don’t worry – I don’t understand it either.  The point is it doesn’t matter because it is all done automatically for you by the tool.
  5. Dataflows include the concept of the common data service (CDS) or common data model directly in the tool and you don’t have to know what it is, nor care.

    • If you don’t know what that is, don’t worry – it doesn’t matter now/yet.

    • This will become very important in the future as it will make the process of getting data out of complex databases (such as MS Dynamics 365) much easier in the future.

Click through for more detail as well as some good uses for Dataflows.

Comments closed

Controlling Azure Services In R With AzureR

Hong Ooi announces a new set of packages called AzureR:

As background, some of you may remember the AzureSMR package, which was written a few years back as an R interface to Azure. AzureSMR was very successful and gained a significant number of users, but it was never meant to be maintainable in the long term. As more features were added it became more unwieldy until its design limitations became impossible to ignore.

The AzureR family is a refactoring/rewrite of AzureSMR that aims to fix the earlier package’s shortcomings.

The core package of the family is AzureRMR, which provides a lightweight yet powerful interface to Azure Resource Manager. It handles authentication (including automatically renewing when a session token expires), managing resource groups, and working with individual resources and templates. It also calls the Resource Manager REST API directly, so you don’t need to have PowerShell or Python installed; it depends only on commonly used R packages like httr, jsonlite and R6.

This won’t replace the Powershell libraries, but looks like it’d be useful for scenarios like if you need to set up a VM, train a model, and then shut down the VM.

Comments closed

Azure Databricks Geospatial Analysis

Jose Mendes gives us an example of using Azure Databricks to perform geospatial analysis:

Magellan is a distributed execution engine for geospatial analytics on big data. It is implemented on top of Apache Spark and deeply leverages modern database techniques like efficient data layout, code generation and query optimization in order to optimize geospatial queries (further details here).

Although people mentioned in their GitHub page that the 1.0.5 Magellan library is available for Apache Spark 2.3+ clusters, I learned through a very difficult process that the only way to make it work in Azure Databricks is if you have an Apache Spark 2.2.1 cluster with Scala 2.11. The cluster I used for this experience consisted of a Standard_DS3_v2 driver type with 14GB Memory, 4 Cores and auto scaling enabled.

In terms of datasets, I used the NYC Taxicab dataset to create the geometry points and the Magellan NYC Neighbourhoods GeoJSON dataset to extract the polygons. Both datasets were stored in a blob storage and added to Azure Databricks as a mount point.

It sounds like this is much faster than using U-SQL to perform the same task.

Comments closed

Creating A SQL Server 2019 Big Data Cluster On Azure

Niels Berglund walks us through the setup for SQL Server 2019 Big Data Clusters:

If you, like me, are a SQL Server guy, you are probably quite familiar with installing SQL Server instances by mounting an ISO file, and running setup. Well, you can forget all that when you deploy a SQL Server 2019 Big Data Cluster. The setup is all done via Python utilities, and various Docker images pulled from a private repository. So, you need Python3. On my box I have Python 3.5, and – according to Microsoft – version 3.7 also works. Make you that you have your Python installation on the path.

When you deploy you use a Python utility: mssqlctl. To download mssqlctl, you need Python’s package management system pip installed. During installation you also need a Python HTTP library: Requests. If you do not have it you need to install it:

python -m pip install requests

This isn’t available to the general public quite yet, but when it is publicly available (or if you are part of the Early Access Program), the instructions are nice and clear.

Comments closed