Cloud – Page 28 – Curated SQL

Recapping an Orchestration Framework

Published 2024-05-29 by Kevin Feasel

Frameworks are extremely useful when they are thoughtfully designed and implemented. I have seen both sides of the coin, but what I probably see the most of is a lack of any sort of framework. What I typically see are some naming conventions and coding standards, but many companies miss the opportunity to take it one step further and reduce the inefficiencies of repetitive tasks. There’s a ton of repetition in ETL processes, and in my opinion that gives us a really good opportunity to streamline the way in which we are doing things with a well designed framework.

Read on for Martin’s notes to keep in mind, as well as where to go from here.

Comments closed

Orchestration Controllers in Azure Data Factory

Published 2024-05-23 by Kevin Feasel

Martin Schoombee gets to the top of the pyramid:

Controllers are pipelines that initiate the execution of a single process or task in a specific order and with constraints. Whereas everything else in this framework is pretty automated, this part is entirely manual.

Why? Well, when I started thinking about the design of this framework I knew I needed something at the “highest level” that would execute an entire daily ETL process, or a modified ETL process that only loads specific data during the day. I wanted to maximize the flexibility of the framework, and that either meant adding another level to the metadata structure or creating this layer of pipelines that sit at the top. I opted for the second, because I did not feel it was worth the complexity of adding another layer into the metadata structure. That being said, it doesn’t mean it cannot or shouldn’t be done…it was a personal choice I made to keep things as simple as I could.

Read on to learn more about what the controller should look like and how the other pieces fit in.

Comments closed

Modern Data Warehousing with Data Lake Storage and Azure Data Factory

Published 2024-05-21 by Kevin Feasel

Josephine Bush continues a series on modern data warehousing:

In today’s data-driven world, having the right tools to manage and process large datasets is crucial. That’s where Azure Data Lake Storage (ADLS) and Azure Data Factory (ADF) come in handy, making it easier than ever to store and transform your data. In this post, I’ll show you how to set up ADLS to store your Parquet files and configure ADF to manage your data flows efficiently.

Read on for an overview of both technologies.

Comments closed

Backup Storage Redundancy in Cosmos DB

Published 2024-05-17 by Kevin Feasel

Manvendra Singh talks about backups:

This article will explain backup storage redundancy for Azure Cosmos DB. Backups are a critical feature to keep copies of our data to ensure data protection and recoverability in case of any accidental deletion, updating, or any kind of disaster. But this is not enough to run backups only to save its copies. We must also protect those backup copies from accidental deletes or corruption and ensure their proper resiliency should be in place to keep backups safe from any unforeseen circumstances. It depends on the criticality of your data whether you want to keep them locally to want to replicate them in other locations or regions to ensure their resiliencies.

The backup process isn’t the same as with a relational database, but it’s still critical to back up your data, for the same reasons that you’d take backups of relational data.

Comments closed

Azure SQL Database Watcher and Query Store

Published 2024-05-17 by Kevin Feasel

Kendra Little is happy:

I’ve spent a bit of time with Microsoft’s new database watcher tool for Azure SQL recently.

There are a lot of things I like about database watcher– which is currently in preview and which refuses to Capitalize Its Name– but it does one big thing that I really, really like: it collects data from Query Store. You can access that Query Store data from built-in database watcher dashboards, query it using KQL, or (something something) in Microsoft Fabric if you’ve got money to burn on your monitoring data.

Query Store has been available since SQL Server 2016, but I haven’t yet heard of monitoring tools that truly take advantage of it. It’s about time.

This is where I’d also plug QDS Toolbox for on-premises environments. A good amount of the reporting information comes out of Query Store and it helps manage Query Store to boot.

Comments closed

MFA Requirement for Azure Users

Published 2024-05-16 by Kevin Feasel

Erin Chapple opens a can of worms:

This July, Azure teams will begin rolling out additional tenant-level security measures to require multi-factor authentication (MFA). Establishing this security baseline at the tenant level puts in place additional security to protect your cloud investments and company.

MFA is a security method commonly required among cloud service providers and requires users to provide two or more pieces of evidence to verify their identity before accessing a service or a resource. It adds an extra layer of protection to the standard username and password authentication.

The problem is, there are a lot of good questions people are asking in the comments and currently, there are no answers.

Comments closed

Creating Orchestrators in Azure Data Factory

Published 2024-05-15 by Kevin Feasel

Martin Schoombee continues a series on building an orchestration framework in Azure Data Factory:

The orchestration layer of the framework is where all the magic happens. It facilitates the execution of processes and/or tasks as defined in the metadata, and needs to do it both seamlessly and efficiently. Ideally you would want to deploy this layer only once, and never have to touch it again. And it is really with that in mind that I designed this layer…to function independently and with minimal dependencies in both directions.

I would have loved for this layer to consist of only one pipeline but there are some nuances in Data Factory that make it impossible, the primary nuance being that you cannot nest ForEach activities. As a result, this layer contains three pipelines that will be covered by the sections below in more detail.

Read on to see what those three pipelines are.

Comments closed

Monitoring and Alerting on Fabric Capacity Metrics

Published 2024-05-13 by Kevin Feasel

Ron L’Esteve wants to know what’s happening:

With Microsoft Fabric now generally available, organizations are interested in implementing this flagship Unified Data and AI Intelligence Platform for several reasons. Its native integration within the Azure stack provides seamless and secure access to widely used technologies for data integration, business intelligence, and advanced analytics. Microsoft Fabric’s storage and compute capacity is utilized by resources within this unified analytics platform, including storage repositories, such as data warehouses and data lakes, and compute capacity for Power BI, Pipelines, DW processing, and artificial intelligence (AI)/machine learning (ML) workloads.

Fabric capacity can be purchased on Azure with a pay-as-you-go model, and a 60-day free trial (64 CUs) is offered to test the platform. Organizations that have an existing Power BI Premium capacity can easily enable access to Fabric by using the Microsoft Fabric admin switch. Enabling Fabric in Power BI Premium as opposed to Azure Portal creates a problem: there is no easy way to monitor and set alerts on your Fabric capacity metrics in the Azure Portal.

Click through to learn how to install and use the Microsoft Fabric Capacity Metrics App.

Comments closed

Building Workers in Azure Data Factory

Published 2024-05-09 by Kevin Feasel

Martin Schoombee continues a series on orchestration in Azure Data Factory:

We’re finally ready to dive into the Data Factory components that form part of the framework, and we’re going to work our way from the bottom up. To paraphrase the previous blog post, worker pipelines perform the actual work of either moving data (from source to staging) or executing a stored procedure that will load a dimension/fact table.

Although worker pipelines can contain any number of tasks you may need, my worker pipelines that move data from a source system into the staging area follow a similar pattern with at least the following activities:

Click through for that list, as well as more information.

Comments closed

Editing the JSON of a Microsoft Fabric Pipeline

Published 2024-05-09 by Kevin Feasel

Dennes Torres makes a change:

A Fabric Pipeline uses JSON as source code. They are also saved in repositories as JSON.

We first idea we get is editing the pipeline in JSON format. We can copy the JSON and create new pipelines with small variations, making changes directly on the JSON.

However, at first sight we get disappointed, because the pipeline doesn’t allow the JSON to be edited. We have the option to view the JSON, but nothing else.

Read on to see how to tell the Fabric pipeline who’s boss.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Category: Cloud