Press "Enter" to skip to content

Category: Cloud

Modern Data Warehousing with Data Lake Storage and Azure Data Factory

Josephine Bush continues a series on modern data warehousing:

In today’s data-driven world, having the right tools to manage and process large datasets is crucial. That’s where Azure Data Lake Storage (ADLS) and Azure Data Factory (ADF) come in handy, making it easier than ever to store and transform your data. In this post, I’ll show you how to set up ADLS to store your Parquet files and configure ADF to manage your data flows efficiently.

Read on for an overview of both technologies.

Comments closed

Backup Storage Redundancy in Cosmos DB

Manvendra Singh talks about backups:

This article will explain backup storage redundancy for Azure Cosmos DB. Backups are a critical feature to keep copies of our data to ensure data protection and recoverability in case of any accidental deletion, updating, or any kind of disaster. But this is not enough to run backups only to save its copies. We must also protect those backup copies from accidental deletes or corruption and ensure their proper resiliency should be in place to keep backups safe from any unforeseen circumstances. It depends on the criticality of your data whether you want to keep them locally to want to replicate them in other locations or regions to ensure their resiliencies.

The backup process isn’t the same as with a relational database, but it’s still critical to back up your data, for the same reasons that you’d take backups of relational data.

Comments closed

Azure SQL Database Watcher and Query Store

Kendra Little is happy:

I’ve spent a bit of time with Microsoft’s new database watcher tool for Azure SQL recently.

There are a lot of things I like about database watcher– which is currently in preview and which refuses to Capitalize Its Name– but it does one big thing that I really, really like: it collects data from Query Store. You can access that Query Store data from built-in database watcher dashboards, query it using KQL, or (something something) in Microsoft Fabric if you’ve got money to burn on your monitoring data.

Query Store has been available since SQL Server 2016, but I haven’t yet heard of monitoring tools that truly take advantage of it. It’s about time.

This is where I’d also plug QDS Toolbox for on-premises environments. A good amount of the reporting information comes out of Query Store and it helps manage Query Store to boot.

Comments closed

MFA Requirement for Azure Users

Erin Chapple opens a can of worms:

This July, Azure teams will begin rolling out additional tenant-level security measures to require multi-factor authentication (MFA). Establishing this security baseline at the tenant level puts in place additional security to protect your cloud investments and company. 

MFA is a security method commonly required among cloud service providers and requires users to provide two or more pieces of evidence to verify their identity before accessing a service or a resource. It adds an extra layer of protection to the standard username and password authentication.

The problem is, there are a lot of good questions people are asking in the comments and currently, there are no answers.

Comments closed

Creating Orchestrators in Azure Data Factory

Martin Schoombee continues a series on building an orchestration framework in Azure Data Factory:

The orchestration layer of the framework is where all the magic happens. It facilitates the execution of processes and/or tasks as defined in the metadata, and needs to do it both seamlessly and efficiently. Ideally you would want to deploy this layer only once, and never have to touch it again. And it is really with that in mind that I designed this layer…to function independently and with minimal dependencies in both directions.

I would have loved for this layer to consist of only one pipeline but there are some nuances in Data Factory that make it impossible, the primary nuance being that you cannot nest ForEach activities. As a result, this layer contains three pipelines that will be covered by the sections below in more detail.

Read on to see what those three pipelines are.

Comments closed

Monitoring and Alerting on Fabric Capacity Metrics

Ron L’Esteve wants to know what’s happening:

With Microsoft Fabric now generally available, organizations are interested in implementing this flagship Unified Data and AI Intelligence Platform for several reasons. Its native integration within the Azure stack provides seamless and secure access to widely used technologies for data integration, business intelligence, and advanced analytics. Microsoft Fabric’s storage and compute capacity is utilized by resources within this unified analytics platform, including storage repositories, such as data warehouses and data lakes, and compute capacity for Power BI, Pipelines, DW processing, and artificial intelligence (AI)/machine learning (ML) workloads.

Fabric capacity can be purchased on Azure with a pay-as-you-go model, and a 60-day free trial (64 CUs) is offered to test the platform. Organizations that have an existing Power BI Premium capacity can easily enable access to Fabric by using the Microsoft Fabric admin switch. Enabling Fabric in Power BI Premium as opposed to Azure Portal creates a problem: there is no easy way to monitor and set alerts on your Fabric capacity metrics in the Azure Portal.

Click through to learn how to install and use the Microsoft Fabric Capacity Metrics App.

Comments closed

Building Workers in Azure Data Factory

Martin Schoombee continues a series on orchestration in Azure Data Factory:

We’re finally ready to dive into the Data Factory components that form part of the framework, and we’re going to work our way from the bottom up. To paraphrase the previous blog post, worker pipelines perform the actual work of either moving data (from source to staging) or executing a stored procedure that will load a dimension/fact table.

Although worker pipelines can contain any number of tasks you may need, my worker pipelines that move data from a source system into the staging area follow a similar pattern with at least the following activities:

Click through for that list, as well as more information.

Comments closed

Editing the JSON of a Microsoft Fabric Pipeline

Dennes Torres makes a change:

A Fabric Pipeline uses JSON as source code. They are also saved in repositories as JSON.

We first idea we get is editing the pipeline in JSON format. We can copy the JSON and create new pipelines with small variations, making changes directly on the JSON.

However, at first sight we get disappointed, because the pipeline doesn’t allow the JSON to be edited. We have the option to view the JSON, but nothing else.

Read on to see how to tell the Fabric pipeline who’s boss.

Comments closed

Comparing SQL Server to Databricks

Paul Andrew makes a comparison:

Microsoft SQL Server and Azure Databricks over the many years I’ve been working in the data/IT industry have easily become my two favourite data processing tools. When Databricks became a first-class resource in Microsoft Azure, it was a big moment for the evolution of the data platform architectures I’ve designed and built (but architecture isn’t the focus for this blog). That said, rather than considering the tooling and technology as an evolution, I find a lot of people drawing comparisons between the products. This often leads to confusion and friction, as they are ultimately offering a lot of different capabilities, with only some common areas where comparisons could be made.

Read on for Paul’s thoughts. Spoilers: I agree with pretty much all of it.

Comments closed

Building an Elastic Job with Bicep

Josephine Bush flexes some muscles:

Bicep is an open-source Domain-Specific Language (DSL) that simplifies the process of deploying Azure resources. It is an abstraction layer on top of Azure Resource Manager (ARM) templates, making it easier to write and understand infrastructure code. Bicep lets you describe your Azure infrastructure using a cleaner and more concise syntax than traditional ARM templates.

It’s definitely easier to read and work with Bicep than directly with ARM template JSON. Larger Bicep scripts can still be pretty confusing, but it’s definitely easier to write and maintain.

Comments closed