Press "Enter" to skip to content

Category: DevOps

CI/CD with Databricks

Sumit Mehrotra takes us through the continuous integration story around Databricks:

Development environment – Now that you have delivered a fully configured data environment to the product (or services) team in your organization, the data scientists have started working on it. They are using the data science notebook interface that they are familiar with to do exploratory analysis. The data engineers have also started working in the environment and they like working in the context of their IDEs. They would prefer a  connection between their favorite IDE and the data environment that allows them to use the familiar interface of their IDE to code and, at the same time, use the power of the data environment to run through unit tests, all in context of their IDE.

Any disciplined engineering team would take their code from the developer’s desktop to production, running through various quality gates and feedback loops. As a start, the team needs to connect their data environment to their code repository on a service like git so that the code base is properly versioned and the team can work collaboratively on the codebase.

This is more of a conceptual post than a direct how-to guide, but it does a good job of getting you on the right path.

Comments closed

Reading Azure DevOps Results in Powershell

Mark Broadbent doesn’t let the lack of an official Powershell module get in the way:

In my post Using Azure CLI to query Azure DevOps I explained how you can use the Azure CLI to query Azure DevOps so you can obtain useful information on builds, releases, and other useful information. The solution required a certain level of skill with JMESPath to manipulate your result sets -which as explained can be a little confusing.

However once you have a bare bones result set, it is likely that you will want to consume these results in a more user-friendly environment such as PowerShell so that you can build upon these data sets. I thought this would be an easy thing to do, but as you will see below it was anything but.

Read on for some thoughts and a sample script.

Comments closed

Scripting and Deploying SQL Agent Jobs

Alex Yates shows how you can incorporate SQL Agent jobs in your CI/CD process:

Basically, we need to put all the SQL Agent Job .sql scripts into a git repo. Then we need a PowerShell script that executes each .sql script against the necessary target databases. If you use SSDT, you might prefer to use a post deployment script to do this. That bit should be reasonably straight forward. I’ll leave that as a task for the user since I’m short on time.

You probably want to put some thought into whether your agent jobs are scoped to a particular database, general server admin for a specific server, or whether you want them to be standardised across many servers since this may affect where you choose to put your jobs ion source control and on what schedule you want to deploy them.

It may also make sense to set up MSX if you have a central server. That would make Agent job deployment easier and you can still script out which sets of servers get which jobs.

Comments closed

Release Flow Branching and Database DevOps

Kendra Little explains why the Azure DevOps Release Flow model can work well for database activity:

But how do you use branches? It’s helpful to pick a strategy. There are many fine Git branching strategies out there, things like GitFlow and GitHub Flow and more — enough that it’s overwhelming to learn about these when you are just starting out.

The strategy that I recommend for folks who are starting out with database DevOps and Git is the Azure DevOps team Release Flow model with dedicated development databases. (Why dedicated development databases? Read more here.)

Read on to learn why.

Comments closed

Using Azure DevOps for Power BI CI/CD

Marc Lelijveld and Ton Swart look at today’s CI/CD options for Power BI:

As a developer we might be used to working with Git repositories, especially in order to have release management in place. Git is well known as a modern version control system. By using Git, you will have a local copy of the code on your machine as well. Based on these local copies, you can continue developing. After you’re finished with your work, you can easily push your local repository to merge with the online (shared) repository. By doing this, only the changes will be pushed and saved in the online repository. In fact, only for the new code there will be a new version created. 

Versioning of Power BI files is a whole different story. Since pbix files are binary files, there is no way of checking-in only the code changes. The process of pushing changes identifies the pbix file as one object which has a new version.

Read on for the state of the art. To be honest, I don’t like the state of the art that much, but that has nothing to do with Marc and Ton’s great article.

Comments closed

Query Folding, Azure DevOps, and Power BI

Eugene Meidinger tries to work around a query folding limitation:

Query folding is one of the most powerful tools in Power Query and Power BI. It is the automatic process of pushing down filters and other transformations back to the data source. This can dramatically improve performance for your queries.

Unfortunately, OData is not guaranteed to support query folding. According to the Power BI documentation on incremental refresh.

Click through for Eugene’s alternative solution.

Comments closed

Scaling Out Continuous Integration

Chris Adkin shows off parallelism in Azure DevOps continuous integration pipelines:

A SQL Server data tools project is checked out of GitHub, built into a DacPac, four containerized SQL Server instances are spun up using clones of the ‘Seed’ docker volume. The DacPac is applied to a database running inside each container, which a tSQLt test is then executed against, finally, at the end very end the tSQLt results are aggregate and published.

This is an interesting approach to the problem of lengthy tests: run them on several separate machines concurrently.

Comments closed

Local Database Builds with Jenkins

Steve Jones continues a series on continuous integration, containers, and all that is good in life:

The only way to build a database project in SQL Server is with an actual SQL Server. In this case, I don’t have any code that would error on LocalDB, so I’ll just use that. I coudl specify my local SQL Server development database if I had the need.

This is a test build, so I also don’t need any SQL Compare options or other switches.

Getting code into source control and building continuous integration around it has become a lot easier over the past several years. Easy enough that you can work a simple system out in a day or two of experimentation.

Comments closed

The State of DevOps for Data Platform Professionals

Kendra Little summarizes the Accelerate: State of DevOps Report 2019 with a focus on what this means for data platform professionals:

While there are a ton of valuable insights in the report, in this post I will focus in on the findings which I believe are most relevant to those of us who work “close to a database.” There are three very interesting aspects of the research which hit close to home:

1. Speed and stability are not tradeoffs
2. Heavy change processes negatively impact speed and stability
3. Communities of practice are a common and successful tool to transform culture

Read on for Kendra’s detailed notes.

Comments closed

Requirements for DevOps Success

Grant Fritchey lays out what you need to succeed at DevOps:

The project managers and others flip. Delivery is slipping. The amount of code being written has changed. Stuff is happening that wasn’t on the schedule. The implementation of DevOps is shut down quickly.

You have to get buy-in from management before you attempt to implement DevOps or it will fail. They have to understand what you’re doing, why you’re doing it, and the measurable benefits it will bring.

Click through for some good thoughts, none of which is “use this software.”

Comments closed