Press "Enter" to skip to content

Category: Cloud

Writing Sparse Pandas DataFrames to S3

Pooja Chhabra tries a few things:

If you’ve worked with large-scale machine learning pipelines, you must know one of the most frustrating bottlenecks isn’t always found in the complexity of the model or the elegance of the architecture — it’s writing the output efficiently.

Recently, I found myself navigating a complex data engineering hurdle where I needed to write a massive Pandas sparse DataFrame — the high-dimensional output of a CountVectorizer — directly to Amazon S3. By massive, I mean tens of gigabytes of feature data stored in a memory-efficient sparse format that needed to be materialized as a raw CSV file. This legacy requirement existed because our downstream machine learning model was specifically built to ingest only that format, leaving us with a significant I/O challenge that threatened to derail our entire processing timeline.

Read on for two major constraints, a variety of false starts, and what eventually worked.

Comments closed

IOPS Slider in Azure SQL Managed Instance Next-Gen

John Morehouse cranks that slider to the right:

If you’ve used Azure SQL Managed Instance General Purpose, you know the drill: to boost memory or I/O, you had to scale the whole instance, paying for extra CPU you might not need—and hoping the upgrade fixed the bottleneck.

It worked but wasn’t elegant and could be slow or awkward. Scaling sometimes took hours when time was of the essence.

The Next-Gen Azure SQL Managed Instance marks a major shift from the old model. It was way overdue.

The downside is that there’s still a per-CPU hard cap on IOPS and it’s low. Granted, it’s only about two orders of magnitude lower than what I’d expect from a decent on-premises solution, but that’s still enough to limit severely my ability to recommend SQL Managed Instance to anybody.

Comments closed

Microsoft Fabric Eventstream Pricing

Anasheh Boisvert puts on the green eyeshade:

In this blog post, we’ll walk through Eventstream’s pricing model to give you a clear understanding of how it works and help you navigate it with confidence.

By the end of this post, you will be able to:

  • Comprehend how Eventstream pricing is structured across its components.
  • Understand the relationship between Eventstream components and billing meters.
  • Review detailed pricing examples to support precise and confident cost estimation.

Read on for a breakdown of the components and several examples.

Comments closed

Configuring a Point-to-Site VPN in Azure

Aleksey Vitsko wants access to private endpoints:

You have resources in Azure (including, but not limited to, Azure SQL), and you have a task at hand to eradicate usage of public endpoints. Security requirements are to start communicating with resources, such as database servers through encrypted VPN channels.

This is the “people in my office will use this” VPN, whereas Azure also has a Point-to-Point VPN for individuals and remote workers.

Comments closed

Using Fabric Cost Analysis

James Serra tries out a tool:

Enter Fabric Cost Analysis (FCA) – a free, open-source solution available to everyone on a Microsoft GitHub repository, designed to shine a light on all your Microsoft Fabric costs. FCA was developed by a multidisciplinary team (Cedric DupuiManel OmaniAntoine Richet, and led by Romain Casteres) with expertise spanning FinOps, Data, and Go-To-Market, with a clear goal: turn a major adoption barrier into a strategic lever for growth.

Conceived directly from customer questions, FCA answers the things people actually want to know: What are we really paying for? What’s included? Where are the optimization opportunities? It doesn’t just track costs—it builds trust, helps organizations explain spend internally, and ultimately accelerates Fabric adoption.

Read on to see what it includes and how it works.

Comments closed

An Overview of SQL Database in Microsoft Fabric

Rebecca Lewis shares some thoughts:

Now let’s look at an actual transactional database running inside Fabric.

SQL database in Microsoft Fabric became generally available at Ignite in November 2025. This isn’t a data warehouse. It’s not a lakehouse with a SQL endpoint. It’s a real OLTP database — based on the same engine as Azure SQL Database — designed for operational workloads, running as a fully managed SaaS service inside your Fabric capacity.

Read on for some thoughts around capabilities and current limitations.

Comments closed

Budgeting in Azure

John Morehouse breaks out the envelopes:

When organizations migrate workloads to Azure, the focus is usually on architecture, performance, and security. Cost management should be part of that conversation—but in practice, it’s often treated as an afterthought. One of the most overlooked and underutilized tools in Azure is Budgets, despite the fact that it can prevent unpleasant billing surprises with minimal effort.

Azure budgeting is useful but not great. I think it relies too much on messaging without enough teeth, so that requires setting up runbooks and humans constantly reacting rather than being able to set stronger rules around scaling down resources prior to getting that unexpected and unwelcome surprise bill.

Comments closed

Accessing Microsoft Graph API via Fabric Data Factory

Paul Hernandez makes a connection:

This article is an updated version of my 2022 post on using Synapse pipelines to retrieve security groups and their members through the Microsoft Graph API. Some customers recently asked for a Microsoft Fabric–based approach, and I also noticed that many developers are still defaulting to Python clients to interact with Graph. While Python works perfectly fine, this walkthrough demonstrates how you can accomplish the same using a parameterized Copy Data activity inside a Fabric Data Factory pipeline.

Read on to see how.

Comments closed

Connecting Microsoft Fabric to Azure DevOps via Service Principal

Yaron Pri Gal doesn’t need no steenkin’ passwords:

Following Azure DevOps Service Principal & Cross Tenant Support (Generally Available) announcement for service principal and cross-tenant support – Microsoft Fabric Git Integration with Azure DevOps (ADO), this blog post serves as a guide to connecting Fabric workspaces to Azure DevOps repositories using service principal.

Fabric Git Integration is the foundation for organizations implementing fully automated CI/CD pipelines, enabling seamless movement of assets across Development, Test, and Production environments.

Currently, Fabric Git Integration supports two major Git providers: Azure DevOps and GitHub. This blog post addresses the new service principal capability for Azure DevOps.

Click through for more info and a link to Microsoft Learn that contains the instructions.

Comments closed