Press "Enter" to skip to content

Category: Cloud

Minimum Viable Data Mesh in Azure

Paul Andrew was on a podcast:

For Paul, delivering a single data mesh data product on its own is not all that valuable – if you are going to go to the expense of implementing data mesh, you need to be able to satisfy use cases that cross domains. And the greater value is in cross-domain interoperability, getting to a data product that wasn’t possible before. And, you need to deliver the data platform alongside those first 2-3 data products, otherwise you create a very hard to support data asset, not really a data product.

When thinking about minimum viable data mesh, Paul views an approach leveraging DevOps and generally CI/CD – or Continuous Integration/Continuous Deliver – as very crucial. You need repeatability/reproducibility to really call something a data product.

Click through for the interview as well as Scott Hirleman’s summary.

Comments closed

Finding Indexing Metrics in Cosmos DB

Hasan Savran looks at the numbers:

You might need Composite Indexes to make your queries more efficient, Cosmos DB does not create any Composite Indexes for you. You need to figure out which properties should have composite indexes then you need to change the indexing policy file to create them. 

    Indexing Metrics comes to your help when you need help with indexing policy. It tells you which indexes the current query uses and it gives you hints about what other indexes you should create to make the query work faster/cheaper. Like many other features of Cosmos DB, you need to write code by using SDK to see Indexing Metrics. The following example shows how to enable Indexing Metrics for your queries.

Click through for a code sample which shows how to collect index metrics.

Comments closed

Troubleshooting Firewall Issues with Azure SQL MI

Emanuele Meazzo sees a problem pop up regularly:

Here is something that will save you lots of time and headaches when trying to connect to Azure SQL Managed Instances, especially from onprem servers or from other clouds; I had to repeat this multiple times to multiple actors, so I know it will happen to someone else too.

In most cases, “Connect Timeout” and/or “Cannot open server xxx requested by the login; Login failed” errors are caused by the firewall configuration and a lack of understanding the SQLMI networking model, let me explain:

Read on for that explanation.

Comments closed

Cost Savings with Azure Data Factory

Koen Verbeeck maximizes the savings:

As you might’ve noticed, pricing in ADF is not the same as it was in SSIS for example. In SSIS, you pay your SQL Server license and you’re done (well, and you buy a server to run it on). It doesn’t matter what you do with SSIS, the cost is the same. If you run 1 package or 1000 packages, there’s no difference except in your electricity bill. However, in ADF you pay more if you use it more. You pay for each action you do, you pay for each activity you use and for how long things are running. There are a couple of guidelines you can follow to try to minimize costs:

Read on for those guidelines and some specific helpful items.

Comments closed

Azure Resource Locks

Craig Porteous explains the benefit (and pain) behind resource locks in Azure:

In theory, these are perfect for preventing accidental (or deliberate) deletion of resources in Azure. They don’t prevent the deletion of data though, only operating at the “control plane” of a resource. That still sounds great though. Turn them on everywhere! That’s another layer of security in your cloud data platform. Right?

Yeah, here’s where the pain comes in. I tried using resource group locks but there are some resources which use delete capabilities, such as Azure Media Service. A delete lock means no ability to delete uploaded videos.

Comments closed

Low-Code Churn Prediction with Synapse Analytics

Gavita Regunath shows off a capability in Azure Synapse Analytics:

We will build a machine learning solution to predict churn using Azure Synapse Analytics and Azure Machine Learning.

Azure Synapse Analytics is Microsoft’s limitless analytics platform that combines enterprise data warehousing and big data analytics. In simple terms, it is a one-stop-shop that allows you to ingest, prepare, and manage data that can then be used for machine learning and business intelligence, all from a single place. It provides a unified platform and encourages collaboration between data and machine learning professionals.

This article will show you how to build an end-to-end solution to train a machine learning model from Azure Synapse analytics using AutoML functionality within Azure Machine Learning. Using the T-SQL Predict statement, we can then use the trained machine model to make predictions against the churn dataset stored in the SQL Pool table. One of the key benefits of working from within Azure Synapse is that all the necessary steps required to train and make predictions with the trained model can be done from a single platform, Azure Synapse.

Click through for the three-step process and a demonstration.

Comments closed

Replacing Common Table Expressions in ADF Dataflows

Jeet Kainth needs an alternative:

At the time of writing, it is not possible to write a query using a CTE in the source of a dataflow. However, there are a few options to deal with this limitation:

– re-write the query using subqueries instead of CTEs

– use a stored procedure that contains the query and reference the stored proc in the source of the dataflow

– write the query as a view and reference the view in the source of the dataflow (this is my preferred method and the one I will demo here)

Jeet focuses on the third alternative. I’d lean toward the second or the third alternative, myself. Probably the second one (stored procedures) but both allow me to create an interface between ADF and the database. That way, underlying table changes will be less likely to require me to make code changes in ADF.

Comments closed

Azure Shared Disk with Zone-Redundant Storage

Dave Bermingham runs some tests:

What makes this interesting is that you can now build shared storage based failover cluster instances that span Availability Zones (AZ).  With cluster nodes residing in different AZs, users can now qualify for the 99.99% availability SLA. Prior to support for ZRS, Azure Shared Disks only supported Locally Redundant Storage (LRS), limiting cluster deployments to a single AZ, leaving users susceptible to outages should an AZ go offline.

There are however a few limitations to be aware of when deploying an Azure Shared Disk with ZRS.

Dave also checks to see how their performance compares to locally-redundant storage.

Comments closed

Partial Update Operations in Cosmos DB

Hasan Savran partially deflates the partial update bubble:

Partial Update was one of the most wanted features by Cosmos DB customers. In a regular update operation, you need to send the whole JSON document to Cosmos DB. This can be silly if your data model is large and you want to update one field in it. With a regular update, your request object will be large because you need to send the whole data model. Regular Update operation needs more resources from the client/SDK and network bandwidth.

    You might think that partial updates might cost fewer request units. Unfortunately, this is not the case. Because Cosmos DB still needs to open the JSON document, change the necessary properties and save the data. Cosmos DB uses almost the same amount of CPU and memory for this operation for a regular update or a partial update.

That it costs just about as much as a full write does reduce the value of partial updates. Still, there is some value in reducing bandwidth requirements or making changes where you don’t know the entire contents of the document up-front.

Comments closed