Press "Enter" to skip to content

Category: DevOps

Serverless SQL Pool CI/CD

Kevin Chant doesn’t have time for manual deployments:

I want to cover one way you can do CI/CD for Azure Synapse Analytics serverless SQL pools using Azure DevOps in this post. Because I know it is a popular topic.

It’s related to my post about how you can create a dacpac for an Azure Synapse Analytics dedicated SQL pool using Azure DevOps. Since they are both based in the same service.

Plus, a while ago I wrote about the increase in demand for Data Platform automation. So, I really wanted to do a post about how you can do CI/CD for Azure Synapse Analytics serverless SQL pools.

Read on to learn how.

Comments closed

Architecting a Jenkins Replacement

Li Haoyi takes us through an internal Databricks tool for continuous integration:

Runbot is a bespoke continuous integration (CI) solution developed specifically for Databricks’ needs. Originally developed in 2019, Runbot incrementally replaces our aging Jenkins infrastructure with something more performant, scalable, and user friendly for both users and maintainers of the service. This blog post will explore the motivations behind developing Runbot, the core design decisions that went into it, and how we used it to greatly improve the experience of all the developers within the Databircks engineering organization.

It doesn’t look like the tool is available externally, but it’s an interesting read and helps understand some of the “why” behind the solution.

Comments closed

Scaling Limitations with Site Reliability Engineering

Tyler Treat argues that the Site Reliability Engineering paradigm doesn’t scale

:We encounter a lot of organizations talking about or attempting to implement SRE as part of our consulting at Real Kinetic. We’ve even discussed and debated ourselves, ad nauseam, how we can apply it at our own product company, Witful. There’s a brief, unassuming section in the SRE book tucked away towards the tail end of chapter 32, “The Evolving SRE Engagement Model.” Between the SLIs and SLOs, the error budgets, alerting, and strategies for handling change management, it’s probably one of the most overlooked parts of the book. It’s also, in my opinion, one of the most important.

Read on for an explanation of this chapter and how it applies to organizations trying to implement SRE.

Comments closed

Automating Semantic Versioning with Azure DevOps

Dave Ruijter shows how you can use Azure DevOps to perform automatic semantic versioning:

I am a fan of using semantic versioning (a.k.a. SemVer) for data solutions, following the v1.0.0 pattern. It helps in the communication between team members and stakeholders, by limiting ambiguity and misunderstandings related to the version of your solution’s releases. With semantic versioning, the trick is to increment the version according to the changes you have made since the latest release. Manually keeping track of that is not an easy task, especially for small teams, without the capacity to have somebody dedicated to this administration task. I found a way to make this a lot easier, leaning on the Pull Request description! And as a bonus, we will create some nice release notes automatically

Click through to see what you need to have set up on your Azure DevOps subscription and a detailed walkthrough of how to set it up.

Comments closed

Deploying Synapse Artifacts to a Managed vNet Workspace

Rui Cunha takes us through an Azure Synapse Analytics deployment scenario:

In my previous article, I demonstrated how we could easily use the Synapse Workspace Deployment extension to accomplish this second stage of the process. I’m now coming back to this topic as I realized that many of our customers were reporting difficulties in completing this second stage of their Synapse CICD process because they were failing to deploy Synapse artifacts to a Managed VNET Synapse Workspace.

In this particular scenario, the deployment was failing because their target workspace was not allowing access from public networks.

Fortunately, the answer isn’t “Allow access from public networks.” Click through to see what you can do instead.

Comments closed

CI/CD with Databricks Notebooks and Azure DevOps

Michael Shtelma and Piotr Majer get us started on an MLOps journey:

This is the first part of a two-part series of blog posts that show how to configure and build end-to-end MLOps solutions on Databricks with notebooks and Repos API. This post presents a CI/CD framework on Databricks, which is based on Notebooks. The pipeline integrates with the Microsoft Azure DevOps ecosystem for the Continuous Integration (CI) part and Repos API for the Continuous Delivery (CD).In the second post, we’ll show how to leverage the Repos API functionality to implement a full CI/CD lifecycle on Databricks and extend it to the fully-blown MLOps solution.

Click through for the article and a link to code. You can also see the pipeline YAML (and Python code it calls) in the repo.

Comments closed

Orchestrating ML Pipelines with Amazon Managed Workflows for Airflow

Juston Leto, et al, show off MLOps capabilities in AWS:

The ability to scale machine learning operations (MLOps) at an enterprise is quickly becoming a competitive advantage in the modern economy. When firms started dabbling in ML, only the highest priority use cases were the focus. Businesses are now demanding more from ML practitioners: more intelligent features, delivered faster, and continually maintained over time. An effective MLOps strategy requires a unified platform that can orchestrate and automate complex data processing and ML tasks, and integrates with the latest tooling to best complete those tasks.

This post demonstrates the value of using Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate an ML pipeline using the popular XGBoost (eXtreme Gradient Boosting) algorithm. For more advanced and comprehensive MLOps capabilities, including a purpose-built model orchestration framework and a continuous integration and continuous delivery (CI/CD) service for ML, readers are encouraged to check out Amazon SageMaker Pipelines.

Read on for a step-by-step tutorial on the process.

Comments closed

Data Platform Deployments via Azure Test Plan

Kevin Chant shows off the power of Azure Test Plans:

In this post I want to cover using Azure Test Plans for Data Platform deployments. Because using it to manage test plans can be very useful.

By the end of this post, you will know what Azure Test Plans are and how they can be useful for data Platform deployments.

Click through to see how this feature in Azure DevOps works and how you can use it to test your deployments.

Comments closed

Renaming a YAML Pipeline in Azure DevOps

Hamish Watson figures out what’s in a name:

I had created a pipeline using YAML – which was called InfrastructureAsCode as the YAMP file was in the root directory.

However I wanted to move it into a folder .\InfrastructureAsCode\pipelines\… and run the YAML file from there – as I would have a non-prod and PROD version of them (as the schedule was different for each).

Click through to see how Hamish was able to resolve this.

Comments closed

Creating a dacpac for a Dedicated SQL Pool

Kevin Chant shows how to use Azure DevOps to create a dacpac for an Azure Synapse Analytics dedicated SQL pool:

By the end of this post, you will know how to create a dacpac for a dedicated SQL Pool within Azure Pipelines for your CI/CD deployments. Plus, how you can synchronize a database project created in Azure Data Studio with a Git repository in Azure DevOps.

In a previous post I covered how you can create a dacpac for an Azure Synapse Analytics dedicated SQL Pool using Azure Data Studio. In that post I stated that you could create a dacpac for the database project using Azure DevOps.

With this in mind, I will use the same database project that I created in that post.

Click through for the process.

Comments closed