Press "Enter" to skip to content

Category: Cloud

Monitoring Oracle on Azure

Kellyn Pot’vin-Gorman covers several tools which are available for working with Oracle databases in Azure:

The Oracle SQL Developer product has come a long way since it’s inception and much of that credit needs to go to the incredible team at Oracle, including those that are prevelant in the Oracle community, Like Jeff Smith, Kris Rice and Ashley Chen.  Their willingness to listen to the Oracle community and turn their needs into features has been one of the critical reasons for the product success.

Although this product is more focused towards the developer, unlike the previous three, I want to point out a few areas that hopefully will convince you there are more similarities than differences.

The shortest version of this is “the same tools as exist on-prem” but if you don’t know that answer, Kellyn’s got you covered.

Comments closed

Parsing ADF ARM Templates with T-SQL

Paul Andrew shows how you can use T-SQL to read an Azure Data Factory ARM template:

While documenting a customers data platform solution I decided it would be far easier if we could summarise the contents of a fairly complex Data Factory using its ARM Template. So, this is what I’ve done using T-SQL to parse the ARM Template JSON and output of series of tables containing details about the factory components.

That is quite the clever solution.

Comments closed

Wrapping Up Azure Data Factory

Cathrine Wilhelmsen wraps up a long series on Azure Data Factory with three final posts. First is lookups:

Lookups are similar to copy data activities, except that you only get data from lookups. They have a source dataset, but they do not have a sink dataset. (So, like… half a copy data activity? :D) Instead of copying data into a destination, you use lookups to get configuration values that you use in later activities.

And how you use the configuration values in later activities depends on whether you choose to get the first row only or all rows.

From there, it’s the bottom line question:

Congratulations! You’ve made it through my entire Beginner’s Guide to Azure Data Factory 🤓 We’ve gone through the fundamentals in the first 23 posts, and now we just have one more thing to talk about: Pricing.

And today, I’m actually going to talk! You see, in November 2019, I presented a 20-minute session at Microsoft Ignite about understanding Azure Data Factory pricing. And since it was recorded and the recording is available for free for everyone… Well, let’s just say that after 23 posts, I think we could both appreciate a short break from reading and writing

In case you missed anything, Cathrine has a summary and shows where you can learn a lot more:

After this, I will be taking a break from creating new content. However, I will continue to edit, update, tweak, rewrite, and improve all 25 posts already published. I originally published one post per day as an Azure Data Factory Advent Calendar, and even while writing I noticed things that I didn’t have time to cover or things that I wanted to go back and improve. But! I needed to get all the posts published first. I consider this the first edition of the series. Now, the editing begins. Then, I will do my best to keep the content updated as Azure Data Factory keeps evolving

This was a huge series; kudos to Cathrine for putting it all together.

Comments closed

Parameters, Variables, and ForEach Loops in ADF

Cathrine Wilhelmsen has a few more posts in the Azure Data Factory series for us. First up is on parameters:

We can build dynamic solutions!

Creating hardcoded datasets and pipelines is not a bad thing in itself. It’s only when you start creating many similar hardcoded resources that things get tedious and time-consuming. Not to mention, the risk of manual errors goes drastically up when you feel like you create the same resource over and over and over again.

After that is variables:

Parameters are external values passed into pipelines. They can’t be changed inside a pipeline. Variables, on the other hand, are internal values that live inside a pipeline. They can be changed inside that pipeline.

Parameters and variables can be completely separate, or they can work together. For example, you can pass a parameter into a pipeline, and then use that parameter value in a set variable or append variable activity.

And the latest post in the series is all about ForEach loops:

By default, the foreach loop tries to run as many iterations as possible in parallel. You can choose to run them sequentially instead, for example if you need to copy data into a single table and want to ensure that each copy finishes before the next one starts.

If you choose to run iterations in parallel, you can limit the number of parallel executions by setting the batch count. The default number is 20 and the max number is 50.

This has been a very nice series, and it looks like there is a little bit more to go.

Comments closed

Azure Data Factory Templates and Source Control

Cathrine Wilhelmsen continues a series on Azure Data Factory. First up is source control:

And yeah, I usually recommend that you set up source control early in your project, and not on day 18… However, it does require some external configuration, and in this series I wanted to get through the Azure Data Factory basics first. But by now, you should know enough to decide whether or not to commit to Azure Data Factory as your data integration tool of choice.

Next up is using the template gallery:

You can also create custom templates and share them with your team – or share them externally with others. Custom templates are saved in your code repository and will show up in the template gallery for you and your team. If you want to share them externally, you can easily export them, so others can import them in their Azure Data Factory.

Let’s take a look!

Read on to learn more.

Comments closed

Changing Synapse Analytics Resource Classes with Azure Automation

John McCormack wins today’s edition of Cloud Bingo:

I was a approached by a data warehouse developer who wanted to be able to take advantage of the feature in Azure Synapse Analytics (Formerly Azure SQL Data Warehouse) called ‘Resource classes’. Resource classes allow users in certain roles to have access to extra compute resources for their queries. This can increase their max memory per query (although there is an overall trade-off on concurrency). Unfortunately, due to this being managed via roles, the person looking to change resource classes needs to be in the db_owner role so they can manage database role membership. As I didn’t want to add the developer to the db_owner role, a different solution was required.

John gives us a nice answer to this problem. Click through for the script.

Comments closed

Recommendations for Implementing Azure Data Factory

Paul Andrew has a nice set of recommendations you should follow when configuring Azure Data Factory:

Building on our understanding of generic datasets, a good Data Factory should include (where possible) generic pipelines, these are driven from metadata to simplify (as a minimum) data ingestion operations. Typically I use an Azure SQLDB to house my metadata with stored procedures that get called via Lookup activities to return everything a pipeline needs to know.

This metadata driven approach means deployments to Data Factory for new data sources are greatly reduced and only adding new values to a database table is required. The pipeline itself doesn’t need to be complicated. Copying CSV files from a local file server to Data Lake Storage could be done with just three activities, shown below.

There are several good recommendations here; read the whole thing.

Comments closed

Managing On-Prem Data Gateways

Jamie Wick has some advice for managing Azure on-premises data gateways:

When organizations become aware of the On-premises Data Gateway application, there can be an explosion in the number of gateways that are connected to the tenant. Some people will invariably install a ‘temporary’ gateway on their local computer while they investigate/develop a new PowerBI report, PowerApp or Azure Automation process. All of these ‘temporary’ gateway instances can create problems when the systems that use them are moved into production and remain connected to the ‘temporary’ gateways.

In a previous post, I wrote about how to install and use an On-premises Data Gateway to connect on-site data with Azure-based apps (PowerBI, PowerApps, etc). Now we’re going to look at how to curtail the sprawl of gateways that may be occurring and how to remove any existing gateways that are deemed to be unnecessary.

Sprawl happens, but Jamie shows you how to limit it.

Comments closed

Jupyter Notebooks and Cosmos DB

Hasan Savran shows how we can use Jupyter notebooks with Cosmos DB:

After you enable the Notebook options, you are ready to analyze or visualize your data thanks to Python language and Python packages. Cosmos DB makes your life easy to write Python and install custom packages to use with your data. There are couple of great internal commands and wildcards you should know if you like to use Notebooks in Azure Cosmos DB. First one I want to introduce you is, %%sql command. This command lets you select data from your containers by using SQL API. You can select data and add it to your Python data frames. You need to define which database and container you want to use before you pass your query. Here is an example. In the following example, I want to use my database named Stackoverflow, and container named Posts. Then I pass the query.

These are internal notebooks, meaning no separate Jupyter server required. There’s a separate way of learning the Cosmos API from external notebooks.

Comments closed

Data Copy & Package Execution in ADF

Cathrine Wilhelmsen continues a series on Azure Data Factory. First, we get to see how to copy data from on-prem SQL Servers:

In the previous post, we looked at the three different types of integration runtimes. In this post, we will first create a self-hosted integration runtime. Then, we will create a new linked service and dataset using the self-hosted integration runtime. Finally, we will look at some common techniques and design patterns for copying data from and into an on-premises SQL Server.

And when I say “on-premises”, I really mean “in a private network”. It can either be a SQL Server on-premises on a physical server, or “on-premises” in a virtual machine.

Then, we learn how to run SSIS packages in Azure Data Factory:

Two posts ago, we looked at the three types of integration runtimes and created an Azure integration runtime. In the previous post, we created a self-hosted integration runtime for copying SQL Server data. In this post, we will complete the integration runtime part of the series. We will look at what SSIS Lift and Shift is, how to create an Azure-SSIS integration runtime, and how you can start executing SSIS packages in Azure Data Factory.

I’m going to guess that the next post will be all about the third integration runtime.

Comments closed