If I could teach SQL to analysts who plan to work in industry data science, I’d start by sharing a few SQL Truths I’ve learned, and why I recommend tracking SQL queries in git. Here goes:
- You will *always* need that query again
- Queries are living artifacts that change over time
- If it’s useful to you, it’s useful to others (and vice versa)
Focusing on these points has led to my continuous adoption of a query library — a git repository for saving and sharing commonly (and uncommonly) used queries, all while tracking any changes made to these queries over time.
This is separate from keeping database objects (like table or procedure definitions) in source control.
But wait, I would need to learn an additional tool?
Yes, but don’t panic! Git is a tool with various commands that you can use to help track your changes. Luckily, you don’t need to know too many commands in Git to use the basic functionality. As an added bonus, using Git with RStudio takes away some of the burden of knowing Git commands by including buttons for common actions.
As with any tool that you pick up to help your scientific workflows, there is some upfront work before you can start seeing the benefits. Don’t let that deter you. Git can be very easy once you get the gist. Think about the benefits of being able to track changes: you can make some changes, have a record of that change and who made it, and you can tie that change to a specific problem that was reported or feature request that was noted.
It’s still code, and you gain a lot by keeping code in source control.
I almost always have trouble remembering which option is for use with a non-empty folder of “here’s a bunch of files that I want to dump into the repo to start with”, vs. “here’s an empty folder where I want to pull down the contents of an existing repo”. Fortunately, Tortoise yells at you if you try to do the latter — which is
Export— into a non-empty folder. So we want to
Import. Assuming you have a folder where all your SQL scripts live already, right-clicky and say “Tortoise SVN .. Import.”
Check it out. The only concern I have is that this source control is just local source control. That’s very helpful in situations where you accidentally mess something up, but my preference is to put my code in the same source control system the developers are using. And if the developers aren’t using source control, get that institution in place as soon as possible because that’s begging for trouble.
Fantastic, one build task created! How easy was that??
Let’s test by running: –az acr build-task run --registry TestContainerRegistry01 --name buildsqlimage
And the progress of the build task can be monitored: –az acr build-task logs --registry TestContainerRegistry01
Andrew gives us the step-by-step details, so check it out.
Yeah, things get messy, no matter what model you choose for securing your PROD data:
- Create one user and give it all rights to the database
- Create specific users and give them the least amount of rights to do what is must, and no more
- Somewhere in between the previous 2
Truly, #3 is generally the answer. Let’s say that you give the application all the rights that any user of the system can have, and let the application dole out the rights to individuals. This is not a terrible plan, but I dare say that many databases contain data, or utilities that it is not desirable to give to the users. (My utility schema generally has tools to maintain and release code, something that you don’t want general users to have access to. And lest you have a developer working “with” you like I once did, you don’t want the application to have access to the tools to disable all of the constraints in the database, even if you have ETL uses for that code.)
Check it out for some examples.
For the advocates of python, a commonly used application is Jupyter Notebook. Jupyter Notebook is a server-client application that allows editing and running of python code via a web browser combining python code, SQL, equations, text, and visualizations. It also offers syncing with GitHub repositories.
More specifically, Jupyter Notebook will be rendered by GitHub directly on your repo page. This means that one can enjoy all the benefits that Git offers regarding version control, branching, merging and collaborative development when using Jupyter Notebook.
The best strategy is probably a multi-tiered strategy. It absolutely starts with source control, but it doesn’t have to end there.
I will assume that you already have TFS setup and are just using HTTP and want to make things a bit more secure with HTTPS. I am also assuming that you will be using port 443 for HTTPS traffic.
To update TFS to use HTTPS you need to do a couple of things:
Have a legitimate certificate installed on the server that you can bind to
Have an IP address on the server and have firewall access setup to that IP address on port 443
But there are a few more steps as well, so click through to see them all.
Of course, this just scratches the surface of using source control and Git. Now that you have a working example, I encourage you to read more especially about branching and merging, and your inevitable merge conflicts. I think branching is incredibly important if you like to tweak your scripts because it gives you a separate copy of your code to work on and test, and when you’re satisfied you merge your changes back to your master branch (or trunk).
Once you get the basics down, it’s really easy to start contributing other projects too! You could fork a project, work on it, and then submit a pull request to the owners to add features. Or, maybe someone will find code you made available and do the same. It’s amazing. Or a very minimum, this will get you up to speed on how you can introduce an SCM practice for SQL-related scripts at your workplace, if you don’t have one, or how you can start using it if you do (and don’t already).
Click through to see how Drew integrates Git with SQL Operations Studio. Spoilers: it’s pretty easy, given the relationship between SqlOps and Visual Studio Code.
This post is for a specific type of person if you are:
- New to source control
- Are getting started on your path to the continuous delivery nirvana
- Have been able to get your database into some sort of source control system
- Have more than one person checking in code to the database source
- You are unsure what yo do next
Then this post is for you!
This is a nice post with some next-steps for when you have a database in source control but aren’t quite sure what to do next.
@brianszadek says: I am looking to automate the scripting of objects with PowerShell for easier source control integration.
I thought this would be a little bit easier, but it looks like the SMO way of doing this is still fairly complicated. Maybe something I can look at simplifying for the project in a bit
If you want to get your database into source control but feel like it’s going to be a time-consuming challenge, this isn’t the time-consuming part.