Press "Enter" to skip to content

Category: Cloud

Getting Started With Azure Databricks

David Peter Hansen has a quick walkthrough of Azure Databricks:

RUN MACHINE LEARNING JOBS ON A SINGLE NODE

A Databricks cluster has one driver node and one or more worker nodes. The Databricks runtime includes common used Python libraries, such as scikit-learn. However, they do not distribute their algorithms.

Running a ML job only on the driver might not be what we are looking for. It is not distributed and we could as well run it on our computer or in a Data Science Virtual Machine. However, some machine learning tasks can still take advantage of distributed computation and it a good way to take an existing single-node workflow and transition it to a distributed workflow.

This great example notebooks that uses scikit-learn shows how this is done.

Read the whole thing.

Comments closed

Query Lables In Azure SQL Data Warehouse

Arun Sirpal demonstrates how to use query labels in Azure SQL Data Warehouse:

Using a query label in Azure SQL DW (Data Warehouse) can be a really handy technique to track queries via DMVs. You might want to do this to see what problematic queries are doing under the covers.

Let’s check out an example. First I will show you how things would look without using a query label. I connect to SQL DW and issue the following basic example query.

It’s an interesting approach and solves a problem I saw in Polybase around figuring out which session details were yours after the fact.

Comments closed

Dataflows In Power BI

James Serra gives us a preview of Power BI Dataflows:

In short, Dataflows integrates data lake and ETL technology directly into Power BI, so anyone with Power Query skills (yes – Power Query is now part of Power BI service and not just Power BI Desktop and is called Power Query online) can create, customize and manage data within their Power BI experience (think of it as self-service data prep).  Dataflows include a standard schema, called the Common Data Model (CDM), that contains the most common business entities across the major functions such as marketing, sales, service, finance, along with connectors that ingest data from the most common sources into these schemas.  This greatly simplifies modeling and integration challenges (it prevents multiple metadata/definition on the same data).  You can also extend the CDM by creating custom entities.  Lastly – Microsoft and their partners will be shipping out-of-the-box applications that run on Power BI that populate data in the Common Data Model and deliver insights through Power BI.

A dataflow is not just the data itself, but also logic on how the data is manipulated.  Dataflows belong to the Data Warehouse/Mart/Lake family.  Its main job is to aggregate, cleanse, transform, integrate and harmonize data from a large and growing set of supported on-premises and cloud-based data sources including Dynamics 365, Salesforce, Azure SQL Database, Excel, SharePoint.  Dataflows hold a collection of data-lake stored entities (i.e. tables) which are stored in internal Power BI Common Data Model compliant folders in Azure Data Lake Storage Gen2.

Also check out the comments for some clarification on why you’d want to use Dataflows rather than doing the work directly in the data lake.

Comments closed

Azure SQL Data Warehouse Restore Points

Arun Sirpal explains how backups work with Azure SQL Data Warehouse:

The question is how are backups done with Azure SQL DW?

It is very different from Azure SQL DB (which you would expect). Azure SQL DW has a totally different architecture to its classic database counter-part. Restore points are the key here. Automatic ones are taken throughout the day and are kept for seven days only. Worst case scenario is the time between the restore points will be eight hours hence giving an eight hour RPO (Recovery Point Objective).

You can also create manual restore points, as Arun shows.

Comments closed

Combining Apache Kafka With TensorFlow

Kai Waehner has an example of an application which uses Apache Kafka to stream car sensor data to TensorFlow on Google ML Engine:

A great benefit of Confluent MQTT Proxy is simplicity for realizing IoT scenarios without the need for a MQTT Broker. You can forward messages directly from the MQTT devices to Kafka via the MQTT Proxy. This reduces efforts and costs significantly. This is a perfect solution if you “just” want to communicate between Kafka and MQTT devices.

If you want to see the other part of the story (integration with sink applications like Elasticsearch / Grafana), please take a look at the Github project “KSQL for streaming IoT data“. This realizes the integration with ElasticSearch and Grafana via Kafka Connect and the Elastic connector.

Check it out and then take a gander at Kai’s GitHub repo.

Comments closed

Azure Data Factor V2 Now In General Availability

Chris Seferlis covers some of the improvements in Azure Data Factory V2:

With ADF V2 you get a browser-based interface using drag and drop technology; V1 was primarily done in the Visual Studio IDE. It also added triggers for scheduling, so you can schedule your jobs when required and in additional ways (which I’ll discuss further in a bit).

Some other features of ADF V2 that came out as it became generally available:

  • Lift and Shift operations for your SSIS packages, so if you have SSIS packages local, you can now Lift and Shift those into compute with the integration runtime service in Data Factory.

  • This also allows for cloud to cloud, cloud to prem, prem to prem and some third-party tools are supported within that as well.

  • Control flow activities like link branching, looping, conditional execution and parameterization.

  • Integration with HD Spark and Databricks for big data workloads and data science.

There’s more where that came from, too.

Comments closed

Overview: U-SQL Database Projects

Zach Stagers gives us an overview of the new U-SQL Database Project structure:

Source Control

The projects integrates much more nicely with TFS than the older “U-SQL Project” does.

It actually gives you the icons (padlock, check mark, etc..) in the solution explorer, so it actually looks like it’s under source control!

Something that I’d really hoped had been fixed, but hasn’t, is when copying and renaming an existing item, it doesn’t recognize the rename. You have to undo the checkout of the non-existent object (the copy, before being renamed):

Read on for more improvements.

Comments closed

Azure SQL Database Service By Purchase Model

Glenn Berry explains the two purchase models available with Azure SQL Database, as well as the various service tiers within each model:

The older pricing option is the DTU-based SQL purchase model, where a fixed set of resources is assigned to the database from three performance tiers, which are Basic, Standard, and Premium.

For Standard and Premium, there are multiple service tiers, which are classified according to how many Database Transaction Units (DTUs) they provide (along with their included storage and maximum available storage). The Premium tier is designed for I/O intensive workloads, and is fault-tolerant.

The Database Transaction Unit (DTU) is based on a blended measure of CPU, memory, along with storage reads and writes. The DTU-based performance levels represent preconfigured bundles of compute, memory, and storage resources designed to drive different levels of application performance. If you do not want to worry about the underlying resources and prefer the simplicity of a preconfigured resource bundle while paying a fixed amount each month, you may find the DTU-based model more suitable for your needs and easier to understand.

Glenn does a good job clearing up some of the complications around pricing for Azure SQL Database.

Comments closed

Configuring An Azure Runbook For Index Maintenance

Jim Donahoe explains how to perform index and statistics maintenance for Azure SQL Database, where you don’t have SQL Agent available:

I had a lot of issues when I created my first one, and after discussing with some folks, they had the same issues.  I searched for the best blog posts that I could find on the subject, and the one I LOVED the most was here: Arctic DBA.  He broke it down so simply, that I finally created my own pseudo installer and I wanted to share it with all of you.  Please, bear in mind, these code snippets may fail at anytime due to changes in Azure.

**IMPORTANT**

These next steps assume the following:

You have created/configured your Azure Automation Account and credential to use to execute this runbook.

Read on for a reasonably short Powershell script and a modified version of Ola Hallengren’s index maintenance procedures.

Comments closed

Building An Azure VM With Powershell

Garry Bargsley shows us how to provision and build a VM in Azure using nothing but Powershell:

I spent the bulk of my day Wednesday going through the Prelab steps outlined in the lab.  I was extremely impressed by this lab and how every step was correct and accurate down to the letter.  Then the more I thought about it, the steps are built around using an Azure Virtual Machine.  With this you get a common machine, framework and steps to build around.  You do not have to worry about the users’ local settings or scenario.  You are starting from the exact same point of reference every time.  So that was fun to connect via SSH to a Linux machine and install SQL Server 2017 and Docker from the command line.  While I know it was easy because someone was telling me what to type, it was still fun to see how the other side (Linux People) live.

Today I was in an adventurous mood to try something new.  I had been wanting to put together a PowerShell script that would deploy an Azure Virtual Machine.  I started down the path a couple time and got stuck so I lost interest.  I thought this was the perfect opportunity to get over the hurdle and combine the Prelab steps in this lab with doing those steps with PowerShell.  So below you will find my first go at building an Azure Virtual Machine using PowerShell to replace the manual steps in the Prelab process.  Not that there was anything wrong with those steps, I just want to try and use a tool that I have been working to learn and use on a day to day basis.  Wish me luck.

Read on for a step-by-step guide.

Comments closed