The Steps Of A Database Deployment

I have a series on near-zero downtime. In this post, I cover some of the key assumptions in the series as well as the steps in a database deployment:

Database Release
The database release phase is the first “primary” phase. It usually starts on a schedule, maybe 2 PM on a Wednesday or maybe “every day at 9 AM, 1 PM, 6 PM, and 10 PM” for more mature shops. Depending upon how much of an effect our release process normally has on end users, we might alert them that we expect to see a degradation in services starting at this point.

This phase of the release has us push out our database changes. This can involve creating or altering database objects but will not involve dropping existing objects.

Our database changes should support the blue-green deployment model. At this point in the process, all of the application code is “blue”—that is, the current production code. Our procedure changes need to be able to support that code without breaking. If we need to drop a column from a stored procedure, for example, we would not want to do it here. If we need to add a column to a stored procedure, we might do it here as long as it doesn’t break the calling code.

This is two topics smashed together into one post, but gives you an idea of a mental model around database deployments.

Creating Custom Helm Charts For SQL Server

Andrew Pruski shows us how we can build our own Helm chart to deploy SQL Server to Kubernetes:

Navigate to the new directory: –

cd C:\Helm

And now create the new chart!

helm create testsqlchart

OK, what that has done is create an empty chart so we need to drop in our yaml configuration files.

Read on to see how to generate the chart and use it to deploy SQL Server.

Platform Compatibility and SSDT

Ed Elliott walks us through platform compatibility in SQL Server Data Tools:

Sometimes you don’t have the perfect development environment for SQL Server, sometimes you deploy to things like SQL Azure and want to test locally, for various reasons it is possible that you want to deploy to one version of SQL Server but set the project properties to a different version of SQL Server. If you find yourself in this situation you might need to use the parameter AllowIncompatiblePlatform
 which essentially says “I know I said the project is for SQL 2017 but I am deploying to SQL 2014”, anytime you say this you also sign the contract that says “If I am deploying to a lower version then I have made sure I haven’t used any keywords or object types that didn’t exist in that version and also if everything fails then so be it, I live with my choices every day”.

The story is a little complicated, but Ed straightens it out for us.

Deploying SQL Server 2019 Big Data Clusters With Kubernetes

Chris Adkin has the start of a new series:

Minikube is a good learning tool and Microsoft provides instructions for deploying a big data cluster to this ‘Platform’. However, its single node nature and the fact that application pods run on the master node means that this does not reflect a cluster that anyone would run in production. Kubernetes-as-a-service is probably by far the easiest option for spinning a cluster up, however it relies on an Aws, Azure or Google Cloud Platform account, hence there is a $ cost associated with this. This leaves a vanilla deployment of Kubernetes on premises. Based on the assumption that most people will have access to Windows server version 2008 or above, a relatively cheap and way of deploying a Kubernetes cluster is via Linux virtual machines running on Hyper-V. This blog post will provide step by step instructions for creating the virtual machines to act as the master and worker nodes in the cluster. 

This is going on my “try this out when I have time” list.  So expect a full report sometime in the year 2023.

Mutli-Branch Pipelines In Jenkins

Chris Adkin continues his SQL Server continuous integration series:

Whatever you elect to do there will always be a master branch, where you go from here depends on whether you favor branching or feature toggles. Wikipedia provides a nice definition of what a feature toggle is, thus:

feature toggle[1] (also feature switchfeature flagfeature flipperconditional feature, etc.) is a technique in software development that attempts to provide an alternative to maintaining multiple source-code branches (known as feature branches), such that a feature can be tested even before it is completed and ready for release. Feature toggle is used to hide, enable or disable the feature during run time. For example, during the development process, a developer can enable the feature for testing and disable it for other users.[2]

A branch is initially a clone of the master branch to begin with, developers work on the branch. Once the work on that branch is code complete and it has been tested to satisfaction, it is merged into the master. An overview of the branching and merging process is provided in the Git documentation here.

The continuous integration and delivery purist are not great fans of branches and prefer the ethos of integrating changes into one place to be rigidly adhered to, ergo one code branch only. However, in practice you will find that most projects have to come up with some sane branching strategy. The subject of branching is a topic in its own right, suffice it to say there is an overhead in applying changes across multiple branches and overheads involved in merging into the master branch. Therefore, there needs to be some governance and rigor applied around the number of branches in the source code repository.

Chris then shows us how to create a multi-branch pipeline with Jenkins.

Deploying SQL Server Code Using Jenkins

Chris Adkin has started a series on continuous integration of SQL Server databases with Jenkins and Docker:

The mainstay of my presentation material this year has been my deck on continuous integration, Docker and Jenkins. For people who have not had the chance to see this presentation or have seen it and wanted to get some more context around it, I have written this first in a series of posts. Much, in fact just about all of the material in this post features in other posts on my blog. The aim of this set of posts is to present the material in a more digestible manner for people who might not be fully fully familiar with Docker and Jenkins.

This first post will cover an introduction to Jenkins and use of the “Sidecar pattern” for deploying DACPACs to. Subsequent posts will expand on this to include:

  • Multi branch build pipelines
  • Parallelism
  • Unit testing with tSQLt
  • The management of database state via Docker volumes

Many people in the SQL Server community have displayed a great interest in containers, only to be left scratching their heads thinking “Well, that is nice, but what can I practically use them for ?”. In my humble opinion, spinning up SQL Server inside a container as a deployment target for a continuous integration pipeline, is one of the, if not the best ways to leverage SQL Server and Docker.

I’m looking forward to the rest of the series.

Tracking Deployment Details

Andy Leonard tells a story whose moral is that you need to keep track of what you deploy:

But this had to be done.
Right now.

I thanked Geoff and hung up the phone. I then made another judgment call and exercised yet more of my ETL Architect authority. I assigned the PrUAT ticket to myself, logged into PrUAT, executed the patch, copied the output of the execution to the Notes field of the ticket (as we’d trained all DBAs and Release Management people to do), and then manually verified the patch was, in fact, deployed to PrUAT.

I closed the ticket and called my boss. “Done. And verified,” I said. My boss replied, “Good,” and hung up. He passed the good news up the chain.

A funny thing happened the next morning. And by “funny,” I mean no-fun-at-all. My boss called and asked, “Andy? I thought you said the patch was was deployed to PrUAT.” I was a little stunned, grappling with the implications of the accusation. He continued, “The process failed again last night and vendor checks were – again – not cut.” I finally stammered, “Let me check on it and get back to you.”

It’s a good story and really sells the idea that you have to track deployment details, including when you’re doing manual deployments.

Updating A Table Using Change Data Capture Without Downtime

Robert Blackburn takes us through the steps of updating a table which uses Change Data Capture without taking a downtime window:

Steps

  1. Stop jobs that process CDC (SSIS).

  2. Inside a transaction with isolation level serializable: Alter Table schema and create temporary CDC table

  3. Copy old CDC rows to new table excluding dup rows (based on [__$seqval])

  4. Disable old (original) CDC table (schema is outdated). Will drop table

Click through for the rest of the steps and an example script.

Building SQL Server Database Projects With YAML

Gavin Campbell shows how to use Visual Studio Team Services’s new YAML build pipeline to build a SQL Server Data Tools project:

The documentation for how to specify build steps in YAML is still a work in progress. In summary, the current procedure is to visit the VSTS Tasks repo on GitHub, open the folder for the task your are interested in, and take a look at the task.json.

In our case, the first task we need is MSBuild, to build the database project. Looking inside task.json, we can see that the name of the task we need is MSBuild, and that there are a huge number of available inputs we can use to configure the task; solution to specify the project or solution to build, platformconfiguration, and many more. In our case, we’ll just specify the path to our .sqlproj file and let msbuild take care of the rest.

Gavin walks us through the entire process step by step, though as he notes, some of these steps are liable to change as the VSTS team continues implementation.

Wrapping Up A Data Science Project

I have finished my series on launching a data science project.  First, I have a post on deploying models as microservices:

The other big shift is a shift away from single, large services which try to solve all of the problems.  Instead, we’ve entered the era of the microservice:  a small service dedicated to providing a single answer to a single problem.  A microservice architecture lets us build smaller applications geared toward solving the domain problem rather than trying to solve the integration problem.  Although you can definitely configure other forms of interoperation, most microservices typically are exposed via web calls and that’s the scenario I’ll discuss today.  The biggest benefit to setting up a microservice this way is that I can write my service in R, you can call it from your Python service, and then some .NET service could call yours, and nobody cares about the particular languages used because they all speak over a common, known protocol.

One concern here is that you don’t want to waste your analysts time learning how to build web services, and that’s where data science workbenches and deployment tools like DeployRcome into play.  These make it easier to deploy scalable predictive services, allowing practitioners to build their R scripts, push them to a service, and let that service host the models and turn function calls into API calls automatically.

But if you already have application development skills on your team, you can make use of other patterns.  Let me give two examples of patterns that my team has used to solve specific problems.

Then, I talk about the iterative nature of post-deployment life:

At this point in the data science process, we’ve launched a product into production.  Now it’s time to kick back and hibernate for two months, right?  Yeah, about that…

Just because you’ve got your project in production doesn’t mean you’re done.  First of all, it’s important to keep checking the efficacy of your models.  Shift happens, where a model might have been good at one point in time but becomes progressively worse as circumstances change.  Some models are fairly stable, where they can last for years without significant modification; others have unstable underlying trends, to the point that you might need to retrain such a model continuously.  You might also find out that your training and testing data was not truly indicative of real-world data, especially that the real world is a lot messier than what you trained against.

The best way to guard against unbeknownst model shift is to take new production data and retrain the model.  This works best if you can keep track of your model’s predictions versus actual outcomes; that way, you can tell the actual efficacy of the model, figuring out how frequently and by how much your model was wrong.

This was a fun series to write and will be interesting to come back to in a couple of years to see how much I disagree with the me of now.

Categories

April 2019
MTWTFSS
« Mar  
1234567
891011121314
15161718192021
22232425262728
2930