Press "Enter" to skip to content

Category: Cloud

Unit Testing Azure Data Factory Pipelines

Richard Swinbank walks us through what it takes to run a unit test against an Azure Data Factory pipeline:

In part three of this series I looked at functional tests for ADF pipelines: verifying, in isolation, that pipelines are “doing things right”. In this post I’ll be testing isolated pipelines to check that they’re “doing the right things” – this is one description of a unit test. In a general-purpose programming language, unit tests might be used to verify that an individual line of code is executed, or that it has a particular effect. In Azure Data Factory, the smallest unit of development – a “line of code” – is a pipeline activity. I will be writing tests to verify that specific activities are executed (or not) and to inspect their results.

There’s a fair bit involved in this sort of test.

Comments closed

Untrusted Shared Access Signature Certificates in SQL Server

William Assaf diagnoses an issue:

If you’ve tried doing Backup to URL with SQL Server using a Shared Access Signature (SAS) certificate and received this error:

Error: 18204, Severity: 16, State: 1.BackupDiskFile::CreateMedia: Backup device ‘https://account.blob.core.windows.net/container/folder/msdb_log_202005170801.trn’ failed to create. Operating system error 50(The request is not supported.).Cannot open backup device ‘https://account.blob.core.windows.net/container/folder/msdb_log_202005170801.trn’. Operating system error 50(The request is not supported.). [SQLSTATE 42000] (Error 3201) BACKUP LOG is terminating abnormally. [SQLSTATE 42000] (Error 3013). NOTE: The step was retried the requested number of times (1) without succeeding. The step failed. 

You may have received the same error I encountered.

This error popped up only after startup of SQL Server. To resolve the problem, we’d recreate the SAS key, using the same cert in the same script, and the backups would start working again. This affected all types of SQL database backups.

William did some troubleshooting and explains the core problem. It’s a weird one.

Comments closed

Working with ADLS Gen 2 in Power Query

Marco Russo takes us through some ways to optimize performance when working with Azure Data Lake Storage Gen 2 from Power Query:

With Power Query you can apply filters to the list obtained by the File System View option, thus restricting the access to only those files (or a single file) you are interested in. However, there is no query folding of this filter. What happens is that every time you refresh the data source, the list of all these files is read by Power Query; the filters in M Query to the folder path and the file name are then applied to this list only client-side. This is very expensive because the entire list is also downloaded when the expression is initially evaluated just to get the structure of the result of the transformation.

A better way to manage the process is to specify in the URL the complete folder path to traverse the hierarchy, and get only the files that are interesting for the transformation – or the exact path of the file is you are able to do that. For example, the data lake I used had one file for each day, stored in a folder structure organized by yyyy\mm, so every folder holds up to 31 files (one month).

Read on for more advice in this vein.

Comments closed

Migrating from Azure SQL DB to an Azure SQL Managed Instance

Eitan Blumin walks us through the process of moving from Azure SQL Database to an Azure SQL Managed Instance:

What we cannot do:

– You cannot backup and restore from Azure SQL DB to a Managed Instance.
– You cannot use Azure SQL DB as a source in Azure Data Migration Service (DMS).
– You cannot use Azure SQL DB as a source in Data Migration Assistant (DMA).
– You cannot use Azure SQL DB as a source in DB Mirroring, AlwaysOn, Transaction Log Shipping, or Replication.
– Implementing a custom T-SQL, SSIS, or ADF (Azure Data Factory) solution will be too complicated, and will require an unacceptable development overhead.

That last one is a bit iffy, though Eitan’s two solutions are going to be easier than a custom solution.

Comments closed

The Power BI Pro’s Guide to Azure Synapse Analytics

Brett Powell gives a quick overview of a new whitepaper:

The Power BI Professional’s Guide to Azure Synapse Analytics, a white paper I wrote describing the Synapse Analytics platform and its benefits and use cases for Power BI professionals, was published a couple weeks ago. This post discusses a few themes from this paper and also shares a couple notes that weren’t included.

There are some interesting notes in here, so check those out and also get ahold of the whitepaper to understand how Power BI relates to the artist formerly known as Azure SQL Data Warehouse.

Comments closed

Passing Around JSON Arrays in Azure Data Factory

Rayis Imayev continues a series on JSON in Azure Data Factory:

It’s not a new thing to know that we can reference nested elements of ADF activities’ output since it’s represented in JSON format or pass the JSON file content to other tasks/components that can process this format.

But what if you need to pass a complete output of your ADF activity task further down your pipeline. Or you need to pass a JSON array elements to another ADF activity or sub-pipeline as the parameter value. Let’s explore what other options available in Azure Data Factory for this very interesting use case.

Read on for the demo.

Comments closed

Stopping an ADF SSIS Integration Runtime

Andy Leonard shows how to shut off the Azure Data Factory SSIS integration runtime when you don’t need to use it:

I have blogged about stopping an Azure-SSIS IR in the past (Start Azure-SSIS, Execute Package, Stop Azure-SSIS). The pattern described in the Start Azure-SSIS, Execute Package, Stop Azure-SSIS post is sound and awesome. My only issue is the pipeline will report failure if the Azure-SSIS IR is not running when the Web Activity executes the Stop command. Mind you, the operation succeeds. But it reports failure.

Reporting failure when the operation succeeds seriously messes with my CDO (that’s OCD with the letters in the proper order).

Read on to see how to do this the best way.

Comments closed

Feeding Databricks Output to Azure SQL Database

Arun Sirpal takes us through the process of moving data from Databricks into Azure SQL Database:

Recently I got to a stage where I leveraged Databricks to the best of my ability to join couple of CSV files together, play around some aggregations and then output it back to a different mount point ( based on Azure Storage) as a parquet file, I decided that I actually wanted to move this data into Azure SQL DB, which you may want to do one day.

This isn’t just dropping files into Blob Storage and picking them up, but rather a direct integration.

Comments closed

Blob Storage Enhancements

James Serra is keeping on top of things around Azure Blob Storage:

Account failover: Customer-initiated storage account failover is now generally available, allowing you to determine when to initiate a failover instead of waiting for Microsoft to do so.  When you perform a failover, the secondary replica of the storage account becomes the new primary. The DNS records for all storage service endpoints—blob, file, queue, and table—are updated to point to this new primary. Once the failover is complete, clients will automatically begin reading from and writing to data to the storage account in the new primary region, with no code changes. Customer initiated failover is available for GRS, RA-GRS, GZRS and RA-GZRS accounts.  To learn more, see Disaster recovery and account failover

Read on for several more improvements.

Comments closed