Press "Enter" to skip to content

Category: Source Control

Databricks Integration with Git Repos

Ka-Hing Chueng and Vaibhav Sethi announce Databricks Repos is now generally available:

Thousands of Databricks customers have adopted Databricks Repos since its public preview and have standardized on it for their development and production workflows. Today, we are happy to announce that Databricks Repos is now generally available.

Databricks Repos was created to solve a persistent problem for data teams: most tools used by data engineering/machine learning practitioners offer poor or no integration with Git version control systems, forcing them to navigate through multiple files, steps and UIs to simply review and commit code. Not only is this time-consuming, but it’s also error-prone.

This has been a bit of a pain point with Databricks in the past, and they’ve come up with this solution. Given that Azure Synapse Analytics has some of the same pain points, I’d expect we’ll see something similar in time.

Comments closed

Working on Multiple Repos with Azure Data Studio

Deborah Melkin shows off a feature of Azure Data Studio:

If you read my T-SQL Tuesday post from this month, I mentioned that I’ve been using Azure Data Studio on daily basis. One of the things that I find I use it for the most is for Source Control with Git. I’m incredibly surprised by this. Maybe it comes from years of using Management Studio and not being able to check in code from the tool that I’m using to write it. (Or maybe I’ve been able to do that all this time and no one told me…?)

As I’m using it, I found two things that have helped me out. So naturally, I thought I’d share.

Click through for information on how to use multiple repos, as well as a bonus item.

Comments closed

Azure Data Factory and Source Control

Ahmad Yaseen shows how you can save Azure Data Factory pipelines in source control:

To overcome these limitations, Azure Data Factory provides us with the ability to integrate with a GIT repository, such as Azure DevOps or GitHub repository, that helps in tracking and versioning the pipelines changes, and incrementally save the pipeline changes during the development stage, without the need to validate the incomplete pipeline, preventing these changes from being lost in case of any crash or failure. In this case, you will be able to test the pipeline, revert any change that is detected as a bug, and publish the pipeline to the Data Factory when everything is developed and validated successfully.

Click through for the setup instructions.

Comments closed

Adding a Database Project to GitHub

Elizabeth Noble shows how you can get your brand new Azure Data Studio project into GitHub:

Once you have the database project created, you’ll want to get your database project added to source control so that you (and others) can modify and manage your database code. This next step is the beginning of allowing you and others to work on the same databases and minimize the risk of overwriting someone else’s work or deploying the wrong code to Production.

Tools like GitHub Desktop and SourceTree have definitely made things easier, especially for the happy path scenarios.

Comments closed

Thoughts on Using Source Control

Kevin Chant shares some thoughts:

In this post I want to cover more thoughts about SQL Server professionals using version control. Because I have had some interesting conversations since my last post about it.

In a previous post I covered how SQL Server professionals can benefit from using version control. Which you can read in detail here.

Now I want to clarify a few things relating to it as well.

Read on for those thoughts.

Comments closed

Tying Azure Data Factory to Source Control

Eddy Djaja explains why you really want to tie Azure Data Factory to your source control:

Azure Data Factory (ADF) is Microsoft’s ETL or more precise: ELT tool in the cloud. For more information of ADF, Microsoft puts the introduction of ADF in this link: https://docs.microsoft.com/en-us/azure/data-factory/introduction. As some have argued if ADF will replace or complement the “on-premise”  SSIS, it is uncertain and only time can tell what will happen in the future.
Unlike SSIS, the authoring of ADF does not use Visual Studio. ADF authoring uses a web browser to create ADF components, such as pipelines, activities, datasets, etc. The simplicity of authoring ADF may confuse the novice developers on how ADF components are saved, stored and published. When logging to ADF for the first time after creating an ADF, the authoring is in the ADF mode. How do we know?

Click through for the explanation and some resources on how to do it.

Comments closed

A Script for Peer Reviews of Code

Paul Andrew shares with us a script which is useful for peer reviewing code before check-in:

Does the code include good comments? Things that explain the reason why logic has been implemented to assist future developers looking at the same code.

All to often I see code comments written that just translate from code to English and tell me what is happening. What should be fairly obvious to the reviewer as they read the code. Why is so much more important.

It’s a 20-point checklist, but worth reviewing and adapting for your own purposes.

Comments closed