Press "Enter" to skip to content

Category: Cloud

Automating Pipeline Migration to Synapse via Azure DevOps

Kevin Chant deploys some Synapse pipelines:

In this post I want to cover how you can automate a pipeline migration to a Synapse workspace using Azure DevOps. As a follow up to a previous post I did about one way to copy an Azure Data Factory pipeline to Synapse Studio.

Because even though the post is good it deserves a follow up showing an automated way of doing it. I wanted to show that it can be done more gracefully.

And we all want to be graceful, right?

Comments closed

Databricks Delta Sharing for Azure

Will Girten, et al, announce Delta Sharing on Azure:

Included in this release is a new and improved API for listing all the tables under all schemas in a share. The new API supports pagination similar to other APIs.

For example, to list all the tables in the Delta share my_share, you can simply send a GET request to the /shares/{share_name}/all-tables endpoint on the sharing server.

Prior to that, you might want to read up on Delta Sharing.

Comments closed

Using Synapse Link for Cosmos DB

I have a post combining Synapse Link for Cosmos DB and the Spark to Synapse SQL Connector:

In this post, we saw how to enable Cosmos DB’s Analytical store, access data using Synapse Link for Cosmos DB, and use the Spark to Synapse SQL Connector to move that data into a dedicated SQL pool. We saw how to do this in a workspace using a managed virtual network with data exfiltration protection enabled, meaning this is the largest number of steps necessary.

Click through for product descriptions and step-by-step instructions.

Comments closed

Scheduling Azure ML Compute Instance Start-Up and Shut-Down

I have a post correcting a statement I made before:

The single biggest problem I have with compute instances is that there is no auto-stop functionality to them. This is really frustrating because you’re paying for that virtual machine like you would any other, so if you forget to turn it off when you go home for the weekend, it’ll cost you. I wish there were a built-in option to shut off a compute instance after a certain amount of inactivity. Instead, you’ll need to start and stop them manually.

It turns out that you can and so I wanted to write a post to correct the record.

Click through to see how you can do this. You can bet that I’ve got it enabled now.

Comments closed

Azure Synapse Analytics Integration Points

Warner Chaves takes us through several integration points with Azure Synapse Analytics:

Azure Stream Analytics allows for in-flight querying of streaming data from Blog storage, Data Lake Storage, IoT Hub or Event Hubs. The querying is done through an easily adoptable SQL language and it really speeds up the development of a streaming solution.

The nice thing here is that Stream Analytics allows the use of a Synapse SQL Pool table as the target for the results of the streaming query. So, this is another way to do near real-time analytics by passing data from a streaming source through a Stream Analytics job and into a Synapse table. You could do this to pre-aggregate data on the fly, score data in real-time, perform real-time calculations over specific time or event windows, etc.

Click through for several examples of this.

Comments closed

Auto-Failover Groups for Azure SQL Hyperscale

Melody Zacharias fills us in on a recent announcement:

On January 5th they announced, auto-failover groups for Azure SQL Hyperscale are now available in preview. Auto-failover groups is a feature that allows you to manage the failover and replication of a group of databases on a server or managed instance from one region to another region in Azure. This can be done manually or in conjunction with a user-defined policy. 

Click through for more information on how it all works.

Comments closed

Automating Azure SQL DB Maintenance with Data Factory

Hiten Bhavsar works around the lack of SQL Agent:

As we know, it’s crucial that you run Database maintenance regularly in order to keep your database performance up with the latest statistics and healthy indexes, here we provide another way to schedule this job using Azure Data Factory; this can be done on a scheduled time interval weekly/bi-weekly/monthly.

Click through for the process and try not to think too hard about this secretly being maintenance plans all over again.

Comments closed

Simple Mapping Data Flows in Synapse

Joshuha Owen announces a new feature:

This week, we are excited to announce the public preview for Map Data, a new feature for Azure Synapse Analytics and Database Templates! The Map Data tool is a guided process to help users create ETL mappings and mapping data flows from their source data to Synapse lake database tables without writing code. This experience will help you get started with transformations into your Synapse Lake database quickly but still give you the power of Mapping Data Flows.

This process starts with the user choosing the destination tables in Synapse lake databases and then mapping their source data into these tables. We will be following up with a demo video shortly.

Click through for more details on how it works.

Comments closed

Azure ML Deployments and Endpoints

I continue a series on low-code machine learning with Azure ML:

The first thing we need to do is create an inference pipeline. Inference pipelines differ from training pipelines in that they won’t use the training dataset, but they will accept user input and provide a scored response. There are two types of inference pipeline: real-time and batch. Real-time inference pipelines are intended for small-set work. We’ll host a service on some compute resource in Azure and people will make REST API calls to our service, sending in a request with a few items to score and we send back classification results.

By contrast, a batch pipeline is what you’d use if you have a nightly job with tens of millions of items to score. In that case, the typical pattern is to have a service listening for changes in a storage account and, some time after people drop new files into the proper folder, the batch inference process will pick up these files, score the results, and write those results out to a destination location.

This post is all about inference pipelines. The next post will be all about batch pipelines.

Comments closed