Category: Cloud

Forcing Job Order In Azure DevOps

Published 2019-03-13 by Kevin Feasel

Chris Taylor shows us how to set up job dependencies in Azure DevOps:

When I first started with VSTS and ultimately Azure DevOps, I went through many failed builds because the order of the jobs in your pipeline don’t run in the order that you’ve built them and how you would logically believe them to run. The image below shows two Build Pipeline jobs but when the build is queued, whether this be manual or via CI, the second job is running before job #1. In this example the build will fail because Job #2 is to deploy a dacpac to a SQL Server on Linux Docker Container (Using Ubuntu Agent Host) but obviously this cannot be done until the dacpac has been created in Job #1 which is running on a VS2017 Agent Host:

Click through to see how it’s done.

Comments closed

Working with Hive in HDInsight

Published 2019-03-12 by Kevin Feasel

Brad Llewellyn takes us through building an HDInsight cluster and writing Hive queries against it:

Hive is a “SQL on Hadoop” technology that combines the scalable processing framework of the ecosystem with the coding simplicity of SQL. Hive is very useful for performant batch processing on relational data, as it leverages all of the skills that most organizations already possess. Hive LLAP (Low Latency Analytical Processing or Live Long and Process) is an extension of Hive that is designed to handle low latency queries over massive amounts of EXTERNAL data. One of this coolest things about the Hadoop SQL ecosystem is that the technologies allow us to create SQL tables directly on top of structured and semi-structured data without having to import it into a proprietary format. That’s exactly what we’re going to do in this post. You can read more about Hive here and here and Hive LLAP here.

We understand that SQL queries don’t typically constitute traditional data science functionality. However, the Hadoop ecosystem has a number of unique and interesting data science features that we can explore. Hive happens to be one of the best starting points on that journey.

Click through for the screenshot-laden demonstration.

Comments closed

Querying Cosmos DB From SQL Server

Published 2019-03-11 by Kevin Feasel

Jovan Popovic shows how you can use the Cosmos DB ODBC driver to perform OPENROWSET queries against Cosmos DB:

Now you need to install ODBC Driver for CosmosDB on the computer where you have SQL Server installed. I’m using Microsoft Azure Cosmos DB ODBC 64-bit.msi for 64-bit Windows – 64-bit versions of Windows 8.1 or later, Windows 8, Windows 7, Windows Server 2012 R2, Windows Server 2012, and Windows Server 2008 R2.
Once you install this driver, you should setup ODBC source in system DSN and test the connection:

If you’re running SQL Server 2019, you can follow Jovan’s first two steps and then create an external data source and table with PolyBase to get to the same results.

Comments closed

Azure Blob Storage and Data Lake Storage Gen2

Published 2019-03-11 by Kevin Feasel

Melissa Coates shows what you need to know about Azure Blob Storage with Azure Data Lake Storage Gen2:

– You may need to consider separate storage accounts if you need to segregate access control (RBAC), virtual networks, access keys, and the like. (Note that RBAC can also be set at the container level too, but ACL type permissions only apply to ADLS Gen2 and not to blob storage.)
– If you don’t need the hierarchical namespace whatsoever (for non-analytical use cases), this could mean a separate storage account. The storage cost is the same but transaction costs are higher when the HNS is enabled (discussed in item #8 of this post).

Click through for more details, including several more tips about Azure Storage Accounts, Azure Blob Storage Containers, and the Azure Storage Blobs themselves.

Comments closed

Passing Messages to Azure Service Bus via Data Factory

Published 2019-03-11 by Kevin Feasel

Rayis Imayev shows how we can use Logic Apps to let Azure Data Factory send messages to Azure Service Bus:

Summary:
1) Azure Data Factory and Service Bus can find common grounds to communicate with each other, and Azure Logic Apps could serve as a good mediator to establish this type of messaging communication.
2) As soon as messages land in a service bus queue, it’s now a responsibility of recipient side to obtain and process those message, which may be part of another blog post.

Click through for a demo of the process.

Comments closed

Deleting in Azure Data Factory

Published 2019-03-07 by Kevin Feasel

Meagan Longoria is happy that Azure Data Factory v2 now has a Delete activity:

It is a common practice to load data to blob storage or data lake storage before loading to a database, especially if your data is coming from outside of Azure. We often create a staging area in our data lakes to hold data until it has been loaded to its next destination. Then we delete the data in the staging area once our subsequent load is successful. But before February 2019, there was no Delete activity. We had to write an Azure Function or use a Logic App called by a Web Activity in order to delete a file. I imagine every person who started working with Data Factory had to go and look this up.
But now Data Factory V2 has a Delete activity.

Meagan shows how it works, what kinds of parameters you can set, and a couple of gotchas, so check it out.

Comments closed

Querying CosmosDB with PolyBase

Published 2019-03-06 by Kevin Feasel

Hasan Savran shows how you can query CosmosDB with T-SQL statements:

SQL Server Polybase Services lets us to pull data from other data resources by using T-SQL queries. SQL Server 2019 introduces new connectors for Polybase services like Oracle, Teradata and MongoDB. In one of the SQL Server 2019 presentations from Bob Ward, I saw the CosmosDB logo when he was talking about the new connectors of SQL Server 2019. CosmosDB already has an ODBC driver and you can use it as a datasource for your Power BI, SSIS or SSMS. SQL Server 2019 makes this connection easier by using the Polybase services.

In this post, I am going to show you how to configure SQL Server 2019 Polybase to connect Azure CosmosDB Mongo databases

I have a bit of a vested interest in this, so it is heartening to see people trying out PolyBase. It’s more heartening when they use it after they try it out.

Comments closed

Restoring Databases From Azure

Published 2019-03-04 by Kevin Feasel

John Morehouse shows how we can restore a database from Azure Blob Storage:

So how do you restore from Azure storage? You do so from an URL. Let’s take a look!
When you backup a database to Azure, there are two types of blobs that can be utilized, namely page and block blobs. Due to price and flexibly, it is recommended to use block blobs. However, depending on which type you used to perform the backup will dictate how the restores are performed. Both methods require the use a credential, so that information will need to be known before being able to restore from Azure.

Click through for examples using both page blobs and block blobs.

Comments closed

Azure Data Factory Data Flows

Published 2019-03-04 by Kevin Feasel

Marlon Ribunal shows how we can perform some amount of data transformation in an Azure Data Factory V2 data flow:

Azure Data Factory (ADF) offers a convenient cloud-based platform for orchestrating data from and to on-premise, on-cloud, and hybrid sources and destinations. But it is not a full Extract, Transform, and Load (ETL) tool. For those who are well-versed with SQL Server Integration Services (SSIS), ADF would be the Control Flow portion.
You can scale out your SSIS implementation in Azure. In fact, there are two (2) options to do this: SSIS On-Premise using the SSIS runtime hosted by SQL Server or On Azure using the Azure-SSIS Integration Runtime.
Azure Data Factory is not quite an ETL tool as SSIS is. There is that transformation gap that needs to be filled for ADF to become a true On-Cloud ETL Tool. The second iteration of ADF in V2 is closing the transformation gap with the introduction of Data Flow.

Despite it not being nearly as complete as SSIS, there are useful data transformations available in Azure Data Factory, as Marlon shows.

Comments closed

Automating Azure Storage To Move Between Tiers

Published 2019-02-28 by Kevin Feasel

Ryan Adams built a process to save money on storage costs for a customer’s test environment:

One of the best things about Azure, and the cloud in general, is we can automate most anything, and we are going to look at how to automate Azure VM Storage. This allows us to come up with some outside-of-the-box solutions. I had a customer with a road block that we were able to work around by automating some things with their Azure Virtual Machines.
Their challenge was that they wanted to move their test and development environments to Azure, but the storage cost was prohibitive. They needed premium storage to mimic their production environment, but it was not financially viable for test and development so they were going to keep it all on premises. During our conversations I learned that they only test between 8am and 5pm, Monday through Friday. My suggestion was that we put their databases on cheaper storage during off times and only premium when they are actively using it.

This doesn’t look like a one-hour task but if you’re in need of some cost savings on storage in non-production environments, check out Ryan’s scripts.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31