Press "Enter" to skip to content

Category: Cloud

Azure Data Factory Publishing Everything instead of Incremental Changes

Ed Elliott troubleshoots an issue:

I recently encountered an interesting issue with ADF where the publish feature suddenly attempted to republish every single object, claiming they were new, despite having incrementally published changed objects for some time.

We were using the publish feature where you work on a branch until you are happy, then you raise a PR to main, merge to main, and then switch back to ADF and click publish to push the changes to the adf_publish branch.

Click through for the answer. I also love how Ed’s tl;dr is “too bad, read it anyhow.”

Leave a Comment

Azure Data Factory Data Flow Logging

Rayis Imayev does a bit of logging:

Azure Data Factory is no exception when it comes to logging options. All your debug or triggered pipeline executions—their parameters passed during execution, statuses, timings, durations, and more, can be monitored natively in Azure Data Studio. Once you immerse yourself in the realm of previously executed pipelines and start seeing all activities, passed input values, processed output results, and variables being transformed into something else that can only be understood by examining internal expressions and many other details, you begin to feel like an investigator meticulously analyzing everything.

Read on to see what kinds of logging options are available and how you can work with them.

Leave a Comment

Fronting Fabric APIs with Azure API Management

Ed Lima combines expensive with expensive:

Integrating Azure API Management (APIM) with Microsoft Fabric’s API for GraphQL can significantly enhance your API’s capabilities by providing robust scalability and security features such as identity management, rate limiting, and caching. This post will guide you through the process of setting up and configuring these features.

API Management is a really neat service, though it’s rather costly. That’s my biggest complaint about it, though it is a doozy.

Leave a Comment

Optimizing a Snowflake Data Warehouse

Harshavardhan Yedla gives us some guidance:

Optimizing a Snowflake data warehouse (DWH) is crucial for ensuring high performance, cost-efficiency, and long-term effectiveness in data processing and analytics. The following outlines the key reasons optimization is essential:

Read on for some tips around optimizing Snowflake warehouses. A lot of this stays at a pretty high level and doesn’t provide detailed guidance, but it’s a good checklist for thinking about your own situation.

Leave a Comment

Kafka Connector for Cosmos DB

Sudhindra Sheshadrivasan announces a new connector has become generally available:

We’re excited to announce the General Availability (GA) of the Confluent fully managed V2 connector for Apache Kafka® for Azure Cosmos DB! This release marks a major milestone in our mission to simplify real-time data streaming from and to Azure Cosmos DB using Apache Kafka®.

The V2 connector is now production-ready and available directly from the Confluent Cloud connector catalog. This managed connector allows you to seamlessly integrate Azure Cosmos DB with your Kafka-powered event streaming architecture—without worrying about provisioning, scaling, or managing the connector infrastructure.

Read on to learn more about the new connector and what it takes to hook everything up.

Comments closed

Checking Valid Configuration Items for Azure SQL Managed Instances

Ben Johnston looks at the art of the possible:

In my last post I described issues that might stop your migration to a SQL Server Managed Instance (SQL MI). This covers configuration items that differ or are not supported in SQL MI. These likely won’t stop your migration, but they could slow you down if you aren’t ready for these changes.

As with previous issues discussed, testing your migration is key. Validate all of your settings and be prepared to make some changes during your migration process. Most of the incompatible options make sense when you think about the purpose of SQL MI – it is controlled by Microsoft. Hardware settings, local file access, high-availability settings, and auditing are configured differently or completely disabled.

Click through to see what you can and cannot do when it comes to configuration.

Comments closed

Goodbye, Azure ML SDK v1

I have a new video:

In this video, I cover some news from Microsoft around the deprecation of the Azure Machine Learning SDK v1. We’ll take a look at the upgrade guide and see what it will take to perform this upgrade.

Microsoft will still support the SDK v1 until September of 2026, so we have a year to get code sorted out. The CLI v1, however, will go away sooner, so be sure you’re keeping up on that.

Comments closed

Executing a Fabric Data Pipeline from Azure Data Factory

Koen Verbeeck leaves the confines of Microsoft Fabric:

In the blog post Call a Fabric REST API from Azure Data Factory I explained how you can call a Fabric REST API endpoint from Azure Data Factory (or Synapse if you will). Let’s go a step further and execute a Fabric Data Pipeline from an ADF pipeline, which is a common request. A Fabric capacity cannot auto-resume, so you typically have an ADF pipeline that starts the Fabric capacity. After the capacity is started, you want to kick-off your ETL pipelines in Fabric and now you can do this from ADF as well.

Click through for the process. Though do check the warnings that Koen offers around either spending extra money by remaining in synchronous execution mode, or always getting a positive result in asynchronous execution mode, regardless of whether the underlying Fabric Data Pipeline worked or not.

Comments closed

Running Cron Jobs in Azure Database for PostgreSQL Flexible Server

Josephine Bush schedules a task:

pg_cron is a simple cron-based job scheduler for PostgreSQL that runs inside the database as an extension. It allows you to schedule PostgreSQL commands directly from your database, similar to using cron jobs at the operating system level. pg_cron on PG Flex is pretty easy to use, making it easy to schedule regular database maintenance and processing tasks directly from within PostgreSQL.

Read on to see how to install the extension, and then how to manage cron jobs. Josephine also lays out some limitations when using pg_cron on Azure and how to track failed jobs.

Comments closed