Cloud – Page 3 – Curated SQL

Data quality checks are critical for any production pipeline. While there are many ways to implement them, the Great Expectations library is a popular one.

Great Expectations is a powerful tool for maintaining data quality by defining, managing, and validating expectations for your data. In this article, we will discuss how you can use it to ensure data quality in your data pipelines.

Click through to see how it all works.

Comments closed

Understanding Availability Zones in Azure

Published 2025-03-19 by Kevin Feasel

Mika Sutinen explains some of the nuance around Azure availability zones:

Azure Availability Zones help provide resiliency to your database services within an Azure Region. I simply love it how simple Microsoft has made building geographically dispersed database services. If you’ve ever designed and deployed multi-site, highly available database services in on-premises, you know what I am talking about.

However, with the Availability Zones in Azure, there are a couple of things to know. I’ve learned my lessons the hard way, so in this post I am providing some tools and guidance on how to avoid some pitfalls when building multi-zone database services.

Click through for that guidance.

Comments closed

Saving an Azure Database for PostgreSQL Backup to a Storage Account

Published 2025-03-17 by Kevin Feasel

Josephine Bush wants an extra copy of the backup:

This may or may not be helpful in the long term, but since I’m doing it to be super cautious, I figured I would blog about it. We migrated to Flex last week, and to be abundantly cautious, we’re putting the last single server backup into cold storage. You could also use this same process to offload Flex if you were going to delete a server and want to save a final backup or have some use case for saving backups to storage longer term.

Read on for the process. It’s not as simple as running a command or two, but Josephine does take us through the process.

Comments closed

Serving Databricks Models via API Management Endpoints

Published 2025-03-10 by Kevin Feasel

Drew Furgiuele makes available a model:

When it comes to generative AI projects I’d argue that the hardest and most tedious part has moved into a new area: hosting and serving your models. Whether you’re working with CPU intensive models, or models that require GPU horsepower, sourcing the hardware, building out deployment pipelines, configuring monitoring, and then securing everything is real, serious work that requires everyone to lean in to get it right.

And then, there’s the real question of how you’re going to use those models: will you be setting up automation and doing batch processing using your models and infrastructure? Or do you want to get really serious and offer up real-time inference? If the latter, you can add one more thing to solve for: managing your front-end APIs that you will have to build to support that use case.

Click through to see how you can use an API management tool (like Azure API Management) to assist in these things.

Comments closed

Load Testing Azure SQL Databases

Published 2025-03-10 by Kevin Feasel

Reitse Eskens sets the stage:

Some time ago, I wrote a number of blogposts comparing the different Azure SQL options to give you some idea about performance, differences between tiers and differences between the Stock Keeping Units (SKU’s). This was done by creating data in the database itself and review the metrics. This works fine and gave a good overview of the different tiers and SKU’s. For reference, you can find those blogs here.

For the new series, I’ve thought of a new process that aligns more with my regular line of work, data warehousing. This means ingesting a lot of data and modelling it.

Click through for the summary of method and initial notes.

Comments closed

Cleaning up Azure Container Registries

Published 2025-03-07 by Kevin Feasel

Jess Pomfret does a bit of cleanup work:

Azure Container Registries can easily become cluttered with many versions of images. Did you know that each ACR sku comes with a certain amount of storage included, and when you go over that, you’ll pay overage charges. Let’s look at how to check your current storage, keep your registry nice and tidy with an ACR clean-up task, and monitor the storage levels so you’ll never pay extra again!

It’s easy to run up the disk space usage with a container registry, especially if you have automated builds running.

Comments closed

Data Security in Snowflake

Published 2025-03-05 by Kevin Feasel

Anil Kumar Moka locks down a Snowflake instance:

In this practical guide, we’ll explore techniques to help you secure your use of Snowflake:

Foundational Security Setup

Secure Views and Their Critical Role

Row-Level Security Implementation Methods

Dynamic Data Masking Strategies

Encryption and Data Protection

Best Practices and Common Pitfalls

Read on for the full article.

Comments closed

Non-Deterministic Functions and Data Factory Logging

Published 2025-03-04 by Kevin Feasel

Richard Swinbank runs into a problem:

TL;DR:

Data Factory implementations in Fabric, Azure Synapse Analytics or Azure Data Factory evaluate pipeline expressions separately for logging and execution.

Log information reported from activities using non-deterministic functions may be unreliable.

Richard does give us a nice tl;dr, but still read the whole thing.

Comments closed

Self-Hosted Integration Runtime Reconnecting to Cloud Service

Published 2025-03-04 by Kevin Feasel

Nivritti Suste handles an error:

In our organization, most data is stored on-premises with a limited set of less critical data is in the cloud. We use Azure to benefit from the cloud environment and Azure Data Factory (ADF) to move data.

With ADF, there are many components that need to integrate within the environment. The data on our on-premises servers needs to be shifted to the cloud periodically and we use Self-hosted Integration Runtime.

Our developers complain an ADF pipeline is failing with error: ‘The Self-hosted Integration Runtime is offline…’ What does this mean?

Click through for the answer.

Comments closed

Multi-Tenant Data Isolation Strategies

Published 2025-03-03 by Kevin Feasel

Rahul Miglani comes up with a list:

As organizations embrace cloud computing, multi-tenancy has become a popular architectural choice, enabling multiple customers (tenants) to share a single cloud environment. However, one of the biggest challenges in multi-tenancy is data isolation—ensuring that each tenant’s data remains private, secure, and accessible only to authorized users.

Microsoft Azure provides several data isolation strategies that allow businesses to securely manage and scale multi-tenant applications while ensuring compliance with regulatory standards like GDPR, HIPAA, and SOC 2.

In this blog, we will explore key data isolation strategies in multi-tenancy Azure architecture, their advantages, and best practices for implementation.

Reading through the list, the same set of options are available on-premises, though the calculus can be a bit different.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Category: Cloud

Data Quality Management with Great Expectations and Databricks

Understanding Availability Zones in Azure

Saving an Azure Database for PostgreSQL Backup to a Storage Account

Serving Databricks Models via API Management Endpoints

Load Testing Azure SQL Databases

Cleaning up Azure Container Registries

Data Security in Snowflake

Non-Deterministic Functions and Data Factory Logging

Self-Hosted Integration Runtime Reconnecting to Cloud Service

Multi-Tenant Data Isolation Strategies