Press "Enter" to skip to content

Category: Cloud

Working with ADLS Gen 2 in Power Query

Marco Russo takes us through some ways to optimize performance when working with Azure Data Lake Storage Gen 2 from Power Query:

With Power Query you can apply filters to the list obtained by the File System View option, thus restricting the access to only those files (or a single file) you are interested in. However, there is no query folding of this filter. What happens is that every time you refresh the data source, the list of all these files is read by Power Query; the filters in M Query to the folder path and the file name are then applied to this list only client-side. This is very expensive because the entire list is also downloaded when the expression is initially evaluated just to get the structure of the result of the transformation.

A better way to manage the process is to specify in the URL the complete folder path to traverse the hierarchy, and get only the files that are interesting for the transformation – or the exact path of the file is you are able to do that. For example, the data lake I used had one file for each day, stored in a folder structure organized by yyyy\mm, so every folder holds up to 31 files (one month).

Read on for more advice in this vein.

Comments closed

Migrating from Azure SQL DB to an Azure SQL Managed Instance

Eitan Blumin walks us through the process of moving from Azure SQL Database to an Azure SQL Managed Instance:

What we cannot do:

– You cannot backup and restore from Azure SQL DB to a Managed Instance.
– You cannot use Azure SQL DB as a source in Azure Data Migration Service (DMS).
– You cannot use Azure SQL DB as a source in Data Migration Assistant (DMA).
– You cannot use Azure SQL DB as a source in DB Mirroring, AlwaysOn, Transaction Log Shipping, or Replication.
– Implementing a custom T-SQL, SSIS, or ADF (Azure Data Factory) solution will be too complicated, and will require an unacceptable development overhead.

That last one is a bit iffy, though Eitan’s two solutions are going to be easier than a custom solution.

Comments closed

The Power BI Pro’s Guide to Azure Synapse Analytics

Brett Powell gives a quick overview of a new whitepaper:

The Power BI Professional’s Guide to Azure Synapse Analytics, a white paper I wrote describing the Synapse Analytics platform and its benefits and use cases for Power BI professionals, was published a couple weeks ago. This post discusses a few themes from this paper and also shares a couple notes that weren’t included.

There are some interesting notes in here, so check those out and also get ahold of the whitepaper to understand how Power BI relates to the artist formerly known as Azure SQL Data Warehouse.

Comments closed

Passing Around JSON Arrays in Azure Data Factory

Rayis Imayev continues a series on JSON in Azure Data Factory:

It’s not a new thing to know that we can reference nested elements of ADF activities’ output since it’s represented in JSON format or pass the JSON file content to other tasks/components that can process this format.

But what if you need to pass a complete output of your ADF activity task further down your pipeline. Or you need to pass a JSON array elements to another ADF activity or sub-pipeline as the parameter value. Let’s explore what other options available in Azure Data Factory for this very interesting use case.

Read on for the demo.

Comments closed

Stopping an ADF SSIS Integration Runtime

Andy Leonard shows how to shut off the Azure Data Factory SSIS integration runtime when you don’t need to use it:

I have blogged about stopping an Azure-SSIS IR in the past (Start Azure-SSIS, Execute Package, Stop Azure-SSIS). The pattern described in the Start Azure-SSIS, Execute Package, Stop Azure-SSIS post is sound and awesome. My only issue is the pipeline will report failure if the Azure-SSIS IR is not running when the Web Activity executes the Stop command. Mind you, the operation succeeds. But it reports failure.

Reporting failure when the operation succeeds seriously messes with my CDO (that’s OCD with the letters in the proper order).

Read on to see how to do this the best way.

Comments closed

Feeding Databricks Output to Azure SQL Database

Arun Sirpal takes us through the process of moving data from Databricks into Azure SQL Database:

Recently I got to a stage where I leveraged Databricks to the best of my ability to join couple of CSV files together, play around some aggregations and then output it back to a different mount point ( based on Azure Storage) as a parquet file, I decided that I actually wanted to move this data into Azure SQL DB, which you may want to do one day.

This isn’t just dropping files into Blob Storage and picking them up, but rather a direct integration.

Comments closed

Blob Storage Enhancements

James Serra is keeping on top of things around Azure Blob Storage:

Account failover: Customer-initiated storage account failover is now generally available, allowing you to determine when to initiate a failover instead of waiting for Microsoft to do so.  When you perform a failover, the secondary replica of the storage account becomes the new primary. The DNS records for all storage service endpoints—blob, file, queue, and table—are updated to point to this new primary. Once the failover is complete, clients will automatically begin reading from and writing to data to the storage account in the new primary region, with no code changes. Customer initiated failover is available for GRS, RA-GRS, GZRS and RA-GZRS accounts.  To learn more, see Disaster recovery and account failover

Read on for several more improvements.

Comments closed

C# Notebooks with Cosmos DB

Hasan Savran takes us through Jupyter notebooks in Cosmos DB:

Jupyter Notebooks are in everywhere in these days. You can write chunk of code and run it on a web application without worrying about compiler is a great feeling. C# has been little bit late to the party, but we started to see C# Notebooks lately too. Azure Cosmos DB announced their version if C# Notebook this week.
     You can reach all notebook functionalities under the Data Explorer link, There are bunch of sample notebooks you will see under the Notebook link.

There are some limitations here, like needing to use the SQL API, but it’s an interesting approach to data access in Cosmos DB.

Comments closed

When to Have Multiple Azure Data Factories

Paul Andrew explains how to become a factory mogul:

The obvious and easy reason for having multiple Data Factory’s could be that you simply want to separate your business processes. Maybe they all have separate data delivery requirements and it just makes management of data flows easier to handle. For example:

– Sales
– Finance
– HR

They could have different data delivery deadlines, they process on different schedules and don’t share any underlying connections.

You may also have multiple projects underway that mean you want to keep teams isolated.

But that’s not the only reason, so click through to learn several other reasons why you might have multiple Azure Data Factory instances running.

Comments closed

Data Platform Announcements from Build

James Serra looks at some announcements from the Build conference:

A few data platform announcements yesterday at Microsoft Build that I wanted to blog about.

The biggest one is Azure Synapse Analytics is now available in public preview! You can immediately log into your Azure portal and use it. While in the Azure portal, search for “Synapse” and you will see “Azure Synapse Analytics (workspaces preview)”. Choose that and then click “Create Synapse workspace” (you first may need to register the resource provider “Microsoft.Synapse” in your subscription – see Azure resource providers and types).

James also covers other highlights, including Cosmos DB and Azure SQL Database Edge.

Comments closed