The documentation for how to specify build steps in YAML is still a work in progress. In summary, the current procedure is to visit the VSTS Tasks repo on GitHub, open the folder for the task your are interested in, and take a look at the
In our case, the first task we need is MSBuild, to build the database project. Looking inside task.json, we can see that the name of the task we need is
MSBuild, and that there are a huge number of available
inputswe can use to configure the task;
solutionto specify the project or solution to build,
configuration, and many more. In our case, we’ll just specify the path to our
.sqlprojfile and let
msbuildtake care of the rest.
Gavin walks us through the entire process step by step, though as he notes, some of these steps are liable to change as the VSTS team continues implementation.
I have finished my series on launching a data science project. First, I have a post on deploying models as microservices:
The other big shift is a shift away from single, large services which try to solve all of the problems. Instead, we’ve entered the era of the microservice: a small service dedicated to providing a single answer to a single problem. A microservice architecture lets us build smaller applications geared toward solving the domain problem rather than trying to solve the integration problem. Although you can definitely configure other forms of interoperation, most microservices typically are exposed via web calls and that’s the scenario I’ll discuss today. The biggest benefit to setting up a microservice this way is that I can write my service in R, you can call it from your Python service, and then some .NET service could call yours, and nobody cares about the particular languages used because they all speak over a common, known protocol.
One concern here is that you don’t want to waste your analysts time learning how to build web services, and that’s where data science workbenches and deployment tools like DeployRcome into play. These make it easier to deploy scalable predictive services, allowing practitioners to build their R scripts, push them to a service, and let that service host the models and turn function calls into API calls automatically.
But if you already have application development skills on your team, you can make use of other patterns. Let me give two examples of patterns that my team has used to solve specific problems.
At this point in the data science process, we’ve launched a product into production. Now it’s time to kick back and hibernate for two months, right? Yeah, about that…
Just because you’ve got your project in production doesn’t mean you’re done. First of all, it’s important to keep checking the efficacy of your models. Shift happens, where a model might have been good at one point in time but becomes progressively worse as circumstances change. Some models are fairly stable, where they can last for years without significant modification; others have unstable underlying trends, to the point that you might need to retrain such a model continuously. You might also find out that your training and testing data was not truly indicative of real-world data, especially that the real world is a lot messier than what you trained against.
The best way to guard against unbeknownst model shift is to take new production data and retrain the model. This works best if you can keep track of your model’s predictions versus actual outcomes; that way, you can tell the actual efficacy of the model, figuring out how frequently and by how much your model was wrong.
This was a fun series to write and will be interesting to come back to in a couple of years to see how much I disagree with the me of now.
No one can deny how large the online support community for data science is. Today, it’s possible to teach yourself Python and other programming languages in a matter of weeks. And if you’re ever in doubt, there’s a StackOverflow thread or something similar waiting to give you the perfect piece of code to help you.
But when it came to pushing it to production, we found very little documentation online. Most data scientists seem to work on Python notebooks in a silo. They process large volumes of data and analyze it — but within the confines of Jupyter Notebooks. And most of the resources we’ve found while growing as data scientists revolve around Jupyter Notebooks.
Another option might be to use JupyterHub.
There’s two things going on here (and one hidden thing):
The first two messages point out that a procedure is referencing the column
ColdRoomSensorNumberwith schemabinding. The reason it’s using schemabinding is because it’s a natively compiled stored procedure. And that tells me that the table
Warehouse.ColdRoomTemperaturesis an In-Memory table. That’s not all. I noticed another wrinkle. The procedure takes a table-valued parameter whose table type contains a column called
ColdRoomSensorLabel. We’re going to have to replace that too. Ugh. Part of me wanted to look for another example.
The last message tells me that the table is a system versioned table. So there’s a corresponding archive table where history is maintained. That has to be dealt with too. Luckily Microsoft has a great article on Changing the Schema of a System-Versioned Temporal Table.
One last thing to worry about is a index on
ColdRoomSensorNumber. That should be replaced with an index on
ColdRoomSensorLabel. SSDT didn’t warn me about that because apparently, it can deal with that pretty nicely.
I’m glad that Michael went with a more complex example—it’s easy to tell this story with a simple procedure versioning, but in seeing a larger change, you can see the rhythm in the process—it’s all the same pattern of steps over and over.
There are two main kinds of SQL queries. SELECT/INSERT/UPDATE/DELETE statements are examples of Data Manipulation Language (DML). CREATE/ALTER/DROP statements are examples of Data Definition Language (DDL).
With schema changes – DDL – we have the added complexity of the SCH-M lock. It’s a kind of lock you don’t see with DML statements. DML statements take and hold schema stability locks (SCH-S) on the tables they need. This can cause interesting blocking chains between the two types where new queries can’t start until the schema change succeeds
Click through for suggestions with regard to schema locks, as well as a few tips for modifying large tables.
Procedures are very easy to Blue-Green. Brand new procedures are added during the pre-migration phase. Obsolete procedures are dropped during the post-migration phase.
If the procedure is changing but is logically the same, then it can be altered during the pre-migration phase. This is common when the only change to a procedure is a performance improvement.
But if the procedure is changing in other ways. For instance, when a new parameter is added, or dropped, or the resultset is changing. Then use the Blue-Green method to replace it: During the pre-migration phase, create a new version of the procedure. It must be named differently and the green version of the application has to be updated to call the new procedure. The original blue version of the procedure is deleted during the post-migration phase. It’s not always elegant calling a procedure something like
s_USERS_Create_v2but it works.
This has been a great series so far, and the way he does deployments matches very closely to the way we do them.
When using the Blue-Green method, basically nothing gets changed. Instead everything gets replaced. We start by setting up a new environment – the green environment – and then cut over to it when we’re ready. Once we cut over to the new environment successfully, we’re free to remove the original blue environment. The technique is all about replacing components rather than altering components.
Check it out for a great explanation, not only of how true blue-green doesn’t jibe well with databases, but how to get it to work well enough.
There are 3 modules included. I’m going to write a bigger piece about these functions elsewhere, but for now here is a link to the documentation –
Read on for more information and a link to the source code if you’re interested in learning more.
I have moved many databases to Azure via different methods but I recently came across a new way. Well technically it’s not new, I should say, newly found. The migration was done via the command line which is not exactly ground breaking but it’s nice to have another option.
The idea behind this is simple. Create the bacpac via command line using sqlpackage.exe with the action as export then do an import action into Azure.
Read on for the demo.
Problem 1 Image Tag
There is no image tag specified for the microsoft/mssql-server-linux image, therefore, if Microsoft push a newer version of the image to docker hub, this will be pulled down from docker hub when the build pipeline runs. This is easily fixed by tagging the image with a tag for an explicit version, e.g. microsoft/mssql-server-linux:2017-GA.
Click through for the starting code, two additional issues, and the corrected code.