Press "Enter" to skip to content

Category: Cloud

Publishing Azure Data Factory via Azure DevOps

Kamil Nowinski shares how to deploy Azure Data Factory flows via Azure DevOps:

Struggling with #ADF deployment? adf_publish branch doesn’t suit your purposes? Don’t have skills with PowerShell? I have good news for you. There is a new tool in the market. It’s a task for Azure DevOps Release Pipeline to deploy whole ADF from code (JSON files) to ADF instance in Azure. Behind the scenes, it runs the PowerShell module which does all job for you.
Sounds unbelievable? But it’s real! Check it out for yourself.

Click through for a video.

Comments closed

Thoughts on Snowflake Database Provisioning

David Stelfox takes us through some thoughts on provisioning instances of Snowflake:

For this example, I’ve chosen an open dataset of 2017 taxi rides in New York City. There are a few options for interacting with Snowflake: a dialog box approach in the web-based GUI, using SQL statements in the Worksheets tab in the GUI or a CLI called SnowSQL. For this example, I used SQL statements as I find them easier to follow what’s happening. Once you have set up your account (or trial) and logged in, you need to create your first database.

Click through for some how-to as well as thoughts about cost and performance.

Comments closed

Backing Up Databases to Azure Blob Storage

David Fowler shows how you can back up databases to Azure Blob Storage:

SQL Server has given us the option to backup our databases directly to Azure BLOB storage for a while now but it’s not something that I’ve had all that much call to use until recently.

So this is just going to be a quick walk through on how you can backup your on premise SQL Servers to Azure BLOB storage. I’m going to assume that you’ve already got an Azure account, if you haven’t, you get set up a free trial which will see you good for this demo.

Performance typically won’t be as good as backing up locally to disk, so if you need the fastest backup performance and cloud storage, the best route would be to write backups to disk and have a separate process which migrates them to Blob Storage, S3, or wherever. But in many cases, doing this directly can work out just fine, especially if you are already using an Azure-based VM.

Comments closed

Custom Parameters in Azure Data Factory Deployments

Rayis Imayev shows us how to use customer parameters in ARM templates when deploying Azure Data Factory pipelines:

If I needed to visually explain how this custom parameterization works for Azure Data Factory resource, I would picture it this way. Before you solely relied on publishing your ADF code from your collaboration Git branch to the adf_publish branch where ARMTemplateForFactory.jsonand ARMTemplateParametersForFactory.json files live and get further deployed to other environments. You had some flexibility to parameterize your deployment or run some custom code to update ARM templates before they get deployed.

With the introduction of the ADF custom parameterization, you have an additional JSON file arm-template-parameters-definition.json that you can use to define rules to add supplementary parameters to the main ARMTemplateParametersForFactory.json file. There is a very important statement on Microsoft documentation site that explains how this new file operates, “A definition can’t be specific to a resource instance. Any definition applies to all resources of that type”. It’s like a garden rake that will collect all the leaves or none, i.e. if your rule defines a JSON property, let’s say “timeout” of your ForEach loop container, then all timeouts will be scooped into ARM template parameter file.

Read on for the full explanation as well as an example.

Comments closed

Calculating Test Coverage of Azure Data Factory Pipelines

Richard Swinbank wraps up a series on testing in Azure Data Factory:

To determine which activities have been executed by a test suite, I need to collect and aggregate activity run data from every pipeline execution triggered from any test fixture. In the previous post I developed components to retrieve and cache activities for a pipeline run – I’ll use those components here to collect data systematically.

I’m going to create a new helper class to contain functions specific to coverage measurement. It’s a subclass of the database helper because I want to exploit functionality from classes further up the hierarchy:

Read on for the code and process for measurement.

Comments closed

Breaking out of Azure Data Factory ForEach Activities

Andy Leonard is planning a jailbreak:

“What if something fails inside the ForEach activity’s inner activities, Andy?”

That is an excellent question! I’m glad you asked. The answer is: The ForEach activity continues iterating.

I can now hear some of you asking…

“What if I want the ForEach activity to fail when an inner activity fails, Andy?”

Another excellent question, and you’ve found a post discussing one way to “break out” of a ForEach activity’s iteration.

Read on for the process.

Comments closed

Tips for Reducing Cloud Costs

Manas Narkar has a few tips for reducing the amount of money you spend on cloud infrastructure:

Cost optimization is a continuous process that evolves as you build your solutions. It starts with the initial architecture and continues throughout the entire solution lifecycle. Getting the architecture right will save you a lot of effort and money down the road. Having said that, you should regularly review your architectural approach and selection of services to adapt to business changes.

A fully cost-optimized system optimizes cloud resources without sacrificing performance and efficiency. When it comes to cost optimization, you can use several tools and techniques. The information below lists some of the core principles that you can apply to any cloud solution.

Costing items in the cloud is a good bit different than on-premises, to the point where entirely different architectures succeed.

Comments closed

The Key Concepts of Azure Synapse Analytics

Simon Whiteley takes a look at what Azure Synapse Analytics really is:

You might have seen that I’ve been pretty busy recently, digging into the new Azure Synapse Analytics preview, announced back at Microsoft Build 2020. I’ve explored the spark engine, SQL serverless/On-Demand and various other bits… but I’m still getting the same question of “Cool!…. but what actually is it?”. One of the problems here is that Azure SQL Data Warehouse was rebranded as “Azure Synapse Analytics”… but it’s not the same as the full workspace. Having two products, both talked about in Marketing, one generally available, one still in preview – it’s no wonder people are still confused!

Simon also has a video, which I recommend so that you can enjoy the funny way he pronounces “Synapse.” That said, next time I’m in the UK, it’ll be just as fair for someone to point out the funny way I pronounce “Synapse.” Also, you should watch the video because Simon knows the topic cold and does a great job of explaining things.

Comments closed

Improving Async Stats Update Concurrency

Dimitri Furman announces a change in Azure SQL Database:

In Azure SQL Database and Azure SQL Managed Instance, the background process that updates statistics asynchronously can now wait for the schema modification lock on a low priority queue. This improves concurrency for workloads with frequent query plan (re)compilations.

New behavior is enabled with the ASYNC_STATS_UPDATE_WAIT_AT_LOW_PRIORITY database-scoped configuration. This feature is currently in public preview.

Dimitri does a good job of explaining what this means and how it can make life a little better for people querying tables with statistics updates.

Comments closed

Understanding the RESOURCE_GOVERNOR_IDLE Wait Type in Azure

Josh Darnell does some sleuthing:

With a big gap between CPU and elapsed time, it’s often worthwhile to check wait statistics. If the query was running, but not using CPU, it seems reasonable that it was waiting on something. Normally, with on-prem SQL Server, you’d have to check sys.dm_os_wait_stats, and take a diff of the cumulative values before and after.

However, thanks to (relatively) recent enhancements to execution plans (which keep getting better and better!), we can see a subset of what resources the query waited on right in the plan.

Looking at the plan from my Azure query, here’s what I see:

<Wait WaitType="SOS_SCHEDULER_YIELD" WaitTimeMs="5733" WaitCount="323" />
<Wait WaitType="RESOURCE_GOVERNOR_IDLE" WaitTimeMs="5545" WaitCount="430" />

Notice that there were 5.5 seconds of RESOURCE_GOVERNOR_IDLE waits during this query. That explains the 5 second gap in CPU and elapsed time. But what does it mean?

Click through to learn more about this in the context of Azure SQL Database.

Comments closed