Press "Enter" to skip to content

Category: Cloud

Using The Cosmos DB Emulator

Cedric Charlier shows off how to use the Azure Cosmos DB emulator to create and query a graph database:

When developing on Azure Cosmos DB, Microsoft’s globally distributed, horizontally partitioned, multi-model database service, it’s useful to use the local emulator. At the moment the Web Interface of the data explorer is oriented for the SQL API and it sounds impossible to create a graph query on the emulator. This is false, the UI is not aware of these features but the engine supporting the Graph API.

You can access the emulator engine by the means of the .Net librairies. Create a new console project in Visual Studio and add the package Microsoft.Azure.Graphs. This package is in preview, you’ll need to specify to the package manager that you’re accepting the previews.

Read on to learn more.

Comments closed

SQL Server Backups On Azure VM

Rolf Tesmer shows us various options available for backing up SQL Server on Azure VMs:

Recently I had a requirement to collate and briefly compare some of the various methods to perform SQL Server backup for databases deployed onto Azure IaaS machines.  The purpose was to provide a few options to cater for the different types(OLTP, DW, etc) and sizes (small to big) of databases that could be deployed there.

Up front, I am NOT saying that these are the ONLY options to perform standard SQL backups!  I am sure there are others – however – the below are both supported and well documented – which when it comes to something as critical as backups is pretty important.

So the purpose of this blog is to provide a quick and brief list of various SQL backup methods!

Read on for the options.

Comments closed

Azure And The Kappa Architecture

Jared Zagelbaum describes the Kappa architecture and points out how there’s limited built-in support in Azure for it:

You can’t support kappa architecture using native cloud services. Cloud providers, including Azure, didn’t design streaming services with kappa in mind. The cost of running streams with TTL greater than 24 hours is more expensive, and generally, the max TTL tops out around 7 days. If you want to run kappa, you’re going to have to run Platform as a Service (PaaS) or Infrastructure as a Service (IaaS), which adds more administration to your architecture. So, what might this look like in Azure?

Read the whole thing.

Comments closed

The Whys Of Azure ML Workbench

Ginger Grant explains why Azure Machine Learning Workbench exists:

Microsoft is looking for Azure Machine Learning Workbench for more than a tool to use for Machine Learning analysis. It is part of a system to manage and monitor the deployment of machine learning solutions with Azure Machine Learning Model Management. The management aspects are part of the application installation.  To install the Azure Machine Learning Workbench, the application download is available only by creating an account in Microsoft’s Azure environment, where a Machine Learning Model Management resource will be created as part of the install. Within this resource, you will be directed to create a virtual environment in Azure where you will be deploying and managing Machine Learning models.

This migration into management of machine learning components is part of a pattern first seen on the on-premises version of data science functionality.  First Microsoft helped companies manage the deployment of R code with SQL Server 2016 which includes the ability to move R code into SQL Server.  Providing this capability decreased the time it took to implement a data science solution by providing a means for the code can be deployed easily without the need for the R code to be re-written or included in another application. SQL Server 2017 expanded on this idea by allowing Python code to be deployed into SQL Server as well.  With the cloud service Model Management, Microsoft is hoping to centralize the implementation so that all Machine Learning services created can be managed in one place.

Read on for more.

Comments closed

Lambda Architecture In Azure

Jared Zagelbaum describes the Lambda architecture pattern and explains how you can use tooling in Azure to implement it:

Lambda is an organic result of the limitations of existing tools. Distributed systems architects and developers commonly criticize its complexity – and rightly so. Those of us that have worked extensively in Extract-Transform-Load and symmetric multiprocessing systems see red flags when code is replicated in multiple services. Ensuring data quality and code conformity across multiple systems, whether massively parallel processing (MPP) or symmetrically parallel system (SMP), has the same best practice: the least amount of times you reproduce code is always the correct number of times.

We reproduce code in lambda because different services in MPP systems are better at different tasks. The maturity of tools historically hasn’t allowed us to process streams and batch in a single tool. This is starting to change, with Apache Spark emerging as a single preferred compute service for stream and batch querying, hence the timing of Azure Databricks. However, on the storage side, what was meant to be an immutable store that is the data lake in practice, can become the dreaded swamp when governance or testing fails; which is not uncommon. A fundamentally different assumption to how we process data is required to combat this degradation. Enter: the kappa architecture, which we’ll examine in the next post of this series.

Interesting reading.

Comments closed

What’s Happing In Azure Data Factory Right Now?

Melissa Coates has a couple Powershell scripts to figure out which pipelines are currently running in Azure Data Factory v1:

This is a quick post to share a few scripts to find what is currently executing in Azure Data Factory. These PowerShell scripts are applicable to ADF version 1 (not version 2 which uses different cmdlets).

Prerequisite: In addition to having installed the Azure Resource Manager modules, you’ll have to register the provider for  Azure Data Factory:

#One-time registration of the ADF provider
#Register-AzureRmResourceProvider -ProviderNamespace Microsoft.DataFactory

Click through for the Powershell snippets.

Comments closed

Setting Up SparklyR In Azure

David Smith shows how you can spin up a Spark cluster in Azure and install SparklyR on top of it:

The SparklyR package from RStudio provides a high-level interface to Spark from R. This means you can create R objects that point to data frames stored in the Spark cluster and apply some familiar R paradigms (like dplyr) to the data, all the while leveraging Spark’s distributed architecture without having to worry about memory limitations in R. You can also access the distributed machine-learning algorithms included in Spark directly from R functions.

If you don’t happen to have a cluster of Spark-enabled machines set up in a nearby well-ventilated closet, you can easily set one up in your favorite cloud service. For Azure, one option is to launch a Spark cluster in HDInsight, which also includes the extensions of Microsoft ML Server. While this service recently had a significant price reduction, it’s still more expensive than running a “vanilla” Spark-and-R cluster. If you’d like to take the vanilla route, a new guide details how to set up Spark cluster on Azure for use with SparklyR.

Read on for more details.

Comments closed

Comparing Data Lake Job Runs

Yanan Cai shows how to compare stats on different executions of a job:

Troubleshooting issues in recurring job is a time-consuming task. It starts with searching through the Job Browser to find instances of a recurring job and identifying both baseline and anomalous performance. This is followed by multi-way comparisons between job instances to figure out what has been changed in the query, data or environment. This is followed by analysis to discover which changes may have performance impact. While this is happening production workloads continue to under-perform or go down.

Azure Data Lake Tools for Visual Studio now makes it easy to spot anomalies and quickly trace the key characteristics across recurring job instances allowing for an efficient debugging experience. The Pipeline Browser automatically groups recurring jobs to simplify discovery of all runs. The Related Job View collects data about inputs, outputs and execution across multiple runs into a single visualization.

Read on for more.

Comments closed

SQL Server Backups To Azure Blob Storage

Kevin Hill shows  how to configure SQL Server to back up a database to Azure blob storage:

Note the “no blobs found” in the container.  After a successful backup, you will see it here.

Click on ‘Container Properties’ to get the URL for this specific container…this will be used in Backup and Restore statements.  Click the button next to the URL to copy it.  For now just remember where this is or copy it to Notepad, Query window etc.  When we start to build our T-SQL statements, we will need both the Access key from earlier and the URL.

Kevin gives clear, step by step instructions on the process.

Comments closed

Specifying IP Address On A Point To Site VPN

John Morehouse shows how to force a particular IP address when building an Azure point-to-site VPN:

Recently, I got to work with a client on something interesting. We implemented transactional replication to send data to an Azure virtual machine.  This was being done to perform some testing for a project.

Given that the two machines were NOT within the same Active Directory domain, we wanted to make sure our client’s data was protected, so we utilized a Point-to-Site VPN to facilitate this.  With the client using a VPN connection, this helps to ensure that any data transmitted to the virtual machine is encrypted and secured.  Note, the process on how to configure and implement the VPN connection is for another blog post.

SQL Server replication requires the use of a server name rather than just the IP addresses. This meant that the virtual machine in Azure had to use an entry in the local host file that was pointed back to the client’s machine.

The down side?  When the VPN connection drops (it happens), the client machine obtains a new IP address upon reconnecting.  Potentially now the host file would have the incorrect IP address and needs to be updated.

Read on to see how John was able to solve this.

Comments closed