Press "Enter" to skip to content

Category: Cloud

Azure Data Factory Notifications

Rayis Imayev walks us through three different techniques for sending notifications in Azure Data Factory:

While working on data integration projects and using Azure Data Factory as your main orchestration tool will help you to develop strategic forward thinking about your development tasks: to ponder on what your data sources are, point of destinations to land this information into a new data model and transformation steps to shape data from the source to its destination. Just like when you play chess and have to plan ahead several of your next moves.

Along with this structural thinking to develop and execute your data flows, timely notifications of when something goes left or right would give you additional peace of mind.

Something I appreciate in this post is that Rayis contrasts the Azure Data Factory techniques with SSIS methods, giving you a solid base for comparison.

Comments closed

Query Concurrency + Azure Synapse Analytics

James Serra takes us through query concurrency with Azure Synapse Analytics:

A common question I here from customers is because of the performance of Azure Synapse Analytics (formally called Azure SQL Data Warehouse or SQL DW), can they run Power BI dashboards against it using DirectQuery (and not have to use Azure Analysis Services (AAS), Import the data into Power BI, or use Power BI aggregation tables), avoiding having another copy of the data (saving money), and having data “real time” (as of the last refresh of the data warehouse)?

There are two things to think of in considering an answer this question. The first is if you will get the performance you need (discussed in my last blog), the second is if a certain amount of concurrent queries or connections will cause a problem (the subject of this blog).

Read the whole thing.

Comments closed

Monitoring Oracle on Azure

Kellyn Pot’vin-Gorman covers several tools which are available for working with Oracle databases in Azure:

The Oracle SQL Developer product has come a long way since it’s inception and much of that credit needs to go to the incredible team at Oracle, including those that are prevelant in the Oracle community, Like Jeff Smith, Kris Rice and Ashley Chen.  Their willingness to listen to the Oracle community and turn their needs into features has been one of the critical reasons for the product success.

Although this product is more focused towards the developer, unlike the previous three, I want to point out a few areas that hopefully will convince you there are more similarities than differences.

The shortest version of this is “the same tools as exist on-prem” but if you don’t know that answer, Kellyn’s got you covered.

Comments closed

Parsing ADF ARM Templates with T-SQL

Paul Andrew shows how you can use T-SQL to read an Azure Data Factory ARM template:

While documenting a customers data platform solution I decided it would be far easier if we could summarise the contents of a fairly complex Data Factory using its ARM Template. So, this is what I’ve done using T-SQL to parse the ARM Template JSON and output of series of tables containing details about the factory components.

That is quite the clever solution.

Comments closed

Wrapping Up Azure Data Factory

Cathrine Wilhelmsen wraps up a long series on Azure Data Factory with three final posts. First is lookups:

Lookups are similar to copy data activities, except that you only get data from lookups. They have a source dataset, but they do not have a sink dataset. (So, like… half a copy data activity? :D) Instead of copying data into a destination, you use lookups to get configuration values that you use in later activities.

And how you use the configuration values in later activities depends on whether you choose to get the first row only or all rows.

From there, it’s the bottom line question:

Congratulations! You’ve made it through my entire Beginner’s Guide to Azure Data Factory 🤓 We’ve gone through the fundamentals in the first 23 posts, and now we just have one more thing to talk about: Pricing.

And today, I’m actually going to talk! You see, in November 2019, I presented a 20-minute session at Microsoft Ignite about understanding Azure Data Factory pricing. And since it was recorded and the recording is available for free for everyone… Well, let’s just say that after 23 posts, I think we could both appreciate a short break from reading and writing

In case you missed anything, Cathrine has a summary and shows where you can learn a lot more:

After this, I will be taking a break from creating new content. However, I will continue to edit, update, tweak, rewrite, and improve all 25 posts already published. I originally published one post per day as an Azure Data Factory Advent Calendar, and even while writing I noticed things that I didn’t have time to cover or things that I wanted to go back and improve. But! I needed to get all the posts published first. I consider this the first edition of the series. Now, the editing begins. Then, I will do my best to keep the content updated as Azure Data Factory keeps evolving

This was a huge series; kudos to Cathrine for putting it all together.

Comments closed

Parameters, Variables, and ForEach Loops in ADF

Cathrine Wilhelmsen has a few more posts in the Azure Data Factory series for us. First up is on parameters:

We can build dynamic solutions!

Creating hardcoded datasets and pipelines is not a bad thing in itself. It’s only when you start creating many similar hardcoded resources that things get tedious and time-consuming. Not to mention, the risk of manual errors goes drastically up when you feel like you create the same resource over and over and over again.

After that is variables:

Parameters are external values passed into pipelines. They can’t be changed inside a pipeline. Variables, on the other hand, are internal values that live inside a pipeline. They can be changed inside that pipeline.

Parameters and variables can be completely separate, or they can work together. For example, you can pass a parameter into a pipeline, and then use that parameter value in a set variable or append variable activity.

And the latest post in the series is all about ForEach loops:

By default, the foreach loop tries to run as many iterations as possible in parallel. You can choose to run them sequentially instead, for example if you need to copy data into a single table and want to ensure that each copy finishes before the next one starts.

If you choose to run iterations in parallel, you can limit the number of parallel executions by setting the batch count. The default number is 20 and the max number is 50.

This has been a very nice series, and it looks like there is a little bit more to go.

Comments closed

Azure Data Factory Templates and Source Control

Cathrine Wilhelmsen continues a series on Azure Data Factory. First up is source control:

And yeah, I usually recommend that you set up source control early in your project, and not on day 18… However, it does require some external configuration, and in this series I wanted to get through the Azure Data Factory basics first. But by now, you should know enough to decide whether or not to commit to Azure Data Factory as your data integration tool of choice.

Next up is using the template gallery:

You can also create custom templates and share them with your team – or share them externally with others. Custom templates are saved in your code repository and will show up in the template gallery for you and your team. If you want to share them externally, you can easily export them, so others can import them in their Azure Data Factory.

Let’s take a look!

Read on to learn more.

Comments closed

Changing Synapse Analytics Resource Classes with Azure Automation

John McCormack wins today’s edition of Cloud Bingo:

I was a approached by a data warehouse developer who wanted to be able to take advantage of the feature in Azure Synapse Analytics (Formerly Azure SQL Data Warehouse) called ‘Resource classes’. Resource classes allow users in certain roles to have access to extra compute resources for their queries. This can increase their max memory per query (although there is an overall trade-off on concurrency). Unfortunately, due to this being managed via roles, the person looking to change resource classes needs to be in the db_owner role so they can manage database role membership. As I didn’t want to add the developer to the db_owner role, a different solution was required.

John gives us a nice answer to this problem. Click through for the script.

Comments closed

Recommendations for Implementing Azure Data Factory

Paul Andrew has a nice set of recommendations you should follow when configuring Azure Data Factory:

Building on our understanding of generic datasets, a good Data Factory should include (where possible) generic pipelines, these are driven from metadata to simplify (as a minimum) data ingestion operations. Typically I use an Azure SQLDB to house my metadata with stored procedures that get called via Lookup activities to return everything a pipeline needs to know.

This metadata driven approach means deployments to Data Factory for new data sources are greatly reduced and only adding new values to a database table is required. The pipeline itself doesn’t need to be complicated. Copying CSV files from a local file server to Data Lake Storage could be done with just three activities, shown below.

There are several good recommendations here; read the whole thing.

Comments closed

Managing On-Prem Data Gateways

Jamie Wick has some advice for managing Azure on-premises data gateways:

When organizations become aware of the On-premises Data Gateway application, there can be an explosion in the number of gateways that are connected to the tenant. Some people will invariably install a ‘temporary’ gateway on their local computer while they investigate/develop a new PowerBI report, PowerApp or Azure Automation process. All of these ‘temporary’ gateway instances can create problems when the systems that use them are moved into production and remain connected to the ‘temporary’ gateways.

In a previous post, I wrote about how to install and use an On-premises Data Gateway to connect on-site data with Azure-based apps (PowerBI, PowerApps, etc). Now we’re going to look at how to curtail the sprawl of gateways that may be occurring and how to remove any existing gateways that are deemed to be unnecessary.

Sprawl happens, but Jamie shows you how to limit it.

Comments closed