Press "Enter" to skip to content

Category: Cloud

Azure API Management in front of Databricks and OpenAI

Drew Furgiuele has a follow-up:

A few months ago, I wrote a blog post about using Azure API Management with Databricks Model Serving endpoints. It struck a chord with a lot of people using Databricks on Azure specifically, because more and more people and organizations are trying their damndest to wrangle all the APIs they use and/or deploy themselves. Recently, I got an email from someone who read it and asked a really good question:

Click through for that question, as well as Drew’s answer.

Leave a Comment

Using a Child Pipeline Variable in a Parent Pipeline in Fabric Data Factory

Justin Bird passes back some information:

I answered a question on the Fabric community on return variables recently and thought I would expand upon it in a blog post. The question was how to use a variable derived in a child pipeline downstream in the parent pipeline. The person was specifically deriving a json object and wanted to iterate on the values in the parent pipeline.

Click through for the solution.

Leave a Comment

Optimizing Multi-Notebook Jobs in Microsoft Fabric and AWS Glue

Daniel Janik flips a switch:

Are your Azure Fabric pipelines with multiple notebooks running slower than you’d like? Are you paying for more Spark compute time than you should be? The culprit might be a simple setting that’s easy to miss. In this blog post, we’ll dive into the “For pipeline running multiple notebooks” setting in Azure Fabric and explain why enabling it can significantly improve your pipeline’s performance and reduce your costs.

Click through for this, as well as a comparison with AWS Glue and ways to perform something similar there.

Leave a Comment

What’s Old is New Again: Lakebases

Daniel Janik notes the cyclical nature of things:

For years, the narrative pushed was that traditional relational databases were ill-suited for the scale and complexity of modern BI solutions. The marketing was something like: “Databases don’t belong in BI; use Spark!” We embraced distributed computing frameworks, data lakes, and complex ETL pipelines to move data from operational databases into analytical engines. The idea was to separate transactional workloads from analytical ones to ensure performance and scalability. Spark, with its ability to handle massive datasets and flexible processing, became the darling of the data world.

“Remember, Sully, when I said you don’t need databases anymore?”

“Yeah, Matrix, I remember!”

“I lied.”

Leave a Comment

Zone Redundancy in Azure SQL Managed Instance

Arun Sirpal explains what zone redundancy is in Azure:

Do you know what happens when you enable zonal redundancy for your SQL managed instance?

Lets define it first (in the context of Business-Critical tier) – zonal redundancy is achieved by placing compute and storage replicas in different availability zones (3) and then using underlying Always On availability group to replicate data changes from the primary instance to standby replicas in other availability zones. 

Availability zones are in the same Azure region, so it works well for high availability but isn’t as good for disaster recovery: if an entire region goes down, zone redundancy won’t help you very much. Also, be aware that you’re paying for what’s running in those three zones because TANSTAAFL.

Comments closed

Copying Azure SQL Databases between Subscriptions

Kenneth Fisher is back in the fight:

I recently had to copy an Azure SQL database (SQL db) from one subscription to an Azure SQL Server instance in another subscription. All of the help I found suggested going to the database and hitting the COPY option. Unfortunately, when I did, I ran into a problem.

Read on for the issue, as well as one way to fix it. The route Kenneth landed on was the same one I ended up going with when I had a similar problem and very limited access to SQL DB on both subscriptions.

Comments closed

Accessing Delta Lake Tables as Iceberg Data

Matthew Hicks makes an announcement:

We’re thrilled to announce an exciting new Preview capability in OneLake: you can now automatically read Delta Lake tables using Apache Iceberg compatible readers, with no need for migration, copying, or manual conversion. This enhancement gives data engineers and analytics teams unprecedented flexibility in how they access and interact with their data.

This is pretty neat, given that Iceberg is the other popular format for data lakes.

Comments closed

Goodbye Default Contributor Role in Fabric Workspace Identities

Varun Jain makes a security announcement:

Fabric workspace identity is an automatically managed service principal that can be associated with a Fabric workspace. Fabric workspaces with a workspace identity can securely read or write to firewall-enabled Azure Data Lake Storage Gen2 accounts through trusted workspace access for OneLake shortcuts. Fabric items can use the identity when connecting to resources that support Microsoft Entra authentication. Fabric uses workspace identities to obtain Microsoft Entra tokens without the customer having to manage any credentials. 

Previously, a workspace identity was automatically assigned the workspace contributor role and had access to workspace items.  

Read on to see what’s changing, why, and what you can do instead.

Comments closed

Bad Request Error Running Powershell in Azure DevOps

Koen Verbeeck wants good requests:

I needed to run a PowerShell cmdlet in an Azure Devops pipeline. The cmdlet in question was New-AzRoleAssignment, but the cmdlet itself isn’t important. What is important is that I needed to pass the object ID of a service principal to the command. Even though I was pretty sure the syntax and everything was correct, I got a “Operation returned an invalid status code ‘BadRequest’” error when the PowerShell was run (inside an Azure PowerShell task):

Read on to see how Koen diagnosed and resolved the issue.

Comments closed