Press "Enter" to skip to content

Category: Cloud

Landing Zone Layouts for Modern Data Warehouses

Paul Hernandez builds out a landing zone for a warehouse:

In this article I want to discuss some different layout options for a landing zone in a modern cloud data warehouse architecture. With landing zone, I mean a storage account where raw data lands directly from its source system (not to be confused with a landing zone to move a system or application into the cloud).

One of the things I appreciate a lot about this post is that it covers the history, showing us how we got to where we are. Paul’s well-versed in each step along the way and lays things out clearly.

Comments closed

Model Deployment using Azure Functions

Alexander Billington needs to get that new model out:

Deploying machine learning (ML) models into production can be challenging, as it requires careful consideration of various factors such as scalability, reliability, and maintainability. While developing an ML model is an exciting process, deploying it into production can be a daunting task. The challenges faced in productionising data science projects can range from infrastructure to version control, model monitoring to integration with other systems. This blog will take a look at how Azure Functions can simplify the deployment process, getting models into production quickly and robustly to maximise their value.

I like this approach and find it interesting, as most of the time, the MLOps model Microsoft recommends has you scheduling Azure DevOps pipelines / GitHub Actions periodically or when new training data hits a specific folder. If you have some non-standard trigger for an action, this is a good way to get you going.

Comments closed

Refreshing a Power BI Dataset via HTTPS URL

Gilbert Quevauvilliers presses the big red button:

I have found that sometimes there are other systems that are loading data, and once they are complete they then want to refresh the Power BI Dataset.

Another way to do this is to use Power Automate, in which a system or user can request a HTTPS URL and once called that will then refresh the Power BI dataset.

I explain how to do this in the steps below.

Click through to see how to set up that job.

Comments closed

Data Validation with Great Expectations and Azure Functions

Eduard van Valkenburg does a bit of data validation:

Great Expectations (Great Expectations Home Page • Great Expectations) is a popular Python-based OSS tool to validate data that comes into your data estate. And for me, validating incoming data is best done file by file, as the files arrive! On Azure there is no better platform for that then Azure Functions. I love the serverless nature of Functions and with the triggers available for arriving blobs, as well as HTTP, event grid events, queues and others. There are some great patterns that allow you to build event-driven data architectures. We also now have the Python v2 framework for Azure Functions available, which makes the developer experience even better. So, let’s go through how to get it running.

This looks really interesting and tying it in to Azure Functions is a good idea assuming that the checks don’t run for too long.

Comments closed

Contrasting Azure IoT Hub and Event Hub

Brian Bønk lays out a quick comparison:

When working with Azure Data Explorer and loading data to the storage engine, you might have some streaming devices or services that should land in the engine.

Azure provides two out-of-the-box services:

  1. Azure IoT Hub
  2. Azure Event Hub

At first glance it seems like teh two services are doing the exact same thing – sending events through to other services in Azure. But there are some differences.

Read on to see what these differences are.

Comments closed

Best Practices Assessment for Azure Arc-Enabled SQL Server Instances

Ganapathi Varma Chekuri takes us through an assessment:

Best practices assessment provides a mechanism to evaluate the configuration of your SQL Server. Once the best practices assessment feature is enabled, your SQL Server instance and databases are scanned to provide recommendations for things like SQL Server and database configurations, index management, deprecated features, enabled or missing trace flags, statistics, etc. Assessment run time depends on your environment (number of databases, objects, and so on), with a duration from a few minutes, up to an hour.

If you’re familiar with the assessment on Azure VMs, this is quite similar, though it extends to on-premises machines or VMs running in other cloud providers. This does require installing the agent and paying for an Arc-Enabled SQL Server instance, so it’s not free.

Comments closed

Estimating and Managing Pod Spread in AKS

Joji Varghese talks pod distribution in Azure Kubernetes Service:

In Azure Kubernetes Service (AKS), the concept of pod spread is important to ensure that pods are distributed efficiently across nodes in a cluster. This helps to optimize resource utilization, increase application performance, and maintain high availability.

This article outlines a decision-making process for estimating the number of Pods running on an AKS cluster. We will look at pod distribution across designated node pools, distribution based on pod-to-pod dependencies and distribution where pod or node affinities are not specified. Finally, we explore the impact of pod spread on scaling using replicas and the role of the Horizontal Pod Autoscaler (HPA). We will close with a test run of all the above scenarios.

Read on for tips, as well as a few web tools, which you can use to estimate and control pod spread in AKS.

Comments closed

Role-Based Access Controls in Amazon OpenSearch

Scott Chang and Muthu Pitchaimani show how to assign rights in Amazon OpenSearch to IAM groups:

Amazon OpenSearch Service is a managed service that makes it simple to secure, deploy, and operate OpenSearch clusters at scale in the AWS Cloud. AWS IAM Identity Center (successor to AWS Single Sign-On) helps you securely create or connect your workforce identities and manage their access centrally across AWS accounts and applications. To build a strong least-privilege security posture, customers also wanted fine-grained access control to manage dashboard permission by user role. In this post, we demonstrate a step-by-step procedure to implement IAM Identity Center to OpenSearch Service via native SAML integration, and configure role-based access control in OpenSearch Dashboards by using group attributes in IAM Identity Center. You can follow the steps in this post to achieve both authentication and authorization for OpenSearch Service based on the groups configured in IAM Identity Center.

Click through for the process.

Comments closed

Using the Log Replay Service to Migrate to Azure SQL MI

Rob Carrol makes a move:

The Log Replay Service (LRS) is a new Azure service that allows you to migrate your databases from SQL Server on-premises, SQL Server on Azure Virtual Machines, Amazon EC2, Amazon RDS for SQL Server, or Google Compute Engine to Azure SQL Managed Instance. LRS is a free cloud service that uses log shipping technology to enable custom migrations of databases from SQL Server 2008 through 2022.

Read on for some configuration options and tips on how to use the service.

Comments closed

Tips for Power BI Modeling with ADX

Dany Hoter shares some tips on creating star schema models with Azure Data Explorer:

Relationships between DQ tables are created as M:M by default. This is not a problem and even recommended with single direction.

Read on for several tips. What’s interesting as I read this is just how radically different the advice is for ADX utilization versus Power BI utilization, such as using strings to join dimensions to facts. That would be heresy in a Kimball-style model and is a common cause for slow-down in Power BI. Yet that’s the recommendation here for working with ADX, unless I’m misunderstanding Dany’s post.

Comments closed