Press "Enter" to skip to content

Curated SQL Posts

Building ML Services Models

I have part two of my three-part series on SQL Server Machine Learning Services modeling:

We used sp_execute_external_script to build a model and generate some number of days worth of predictions.  Now I’d like to break that up into two operations:  training a model and generating predictions.  The biggest reason I might want to do this is if model generation is very expensive relative to prediction generation.  It’s obviously not very expensive to build a random number generator following a Poisson distribution, but imagine we had a complicated neural net that took hours or days to train—we’d want to use that model over and over, as long as the model came close enough to representing the underlying reality.

So now we’re going to create a model.

Click through to see a more complete example, something closer to production-ready.

Comments closed

Make Your Powershell Cmdlets Hypothetical

Rob Sewell shows you how to implement Confirm, Verbose, and WhatIf parameters in Powershell cmdlets:

I have done this before because if the file does not exist then Set-Content will create a new file for you, but with this function I can check if the file exists first with the ValidateScript before running the rest of the function.

As you can see I add variables from my PowerShell code into the “The Item” and “The Change”. If I need to add a property of an object I use $($Item.Property).

So now, if I want to see what my new function would do if I ran it without actually making any changes I have -WhatIf added to my function automagically.

It’s easy to do and makes your code that much better.

Comments closed

Using Computed Columns For Pattern Matching

Erik Darling shows how you can use a computed column to speed up %something searches:

We were talking about computed columns, and one of our students mentioned that he uses computed columns that run the REVERSE() function on a column for easier back-searching.

What’s back-searching? It’s a word I just made up.

The easiest example to think about and demo is Social Security Numbers.

One security requirement is often to give the last four.

Obviously running this for a search WHERE ssn LIKE '%0000' would perform badly over large data sets.

One of the trickiest parts of performance tuning is understanding that the way people want to see data is not necessarily the way you should store the data.

Comments closed

SQL Persistent Storage In Azure Container Services

Andrew Pruski shows how to use Kubernetes persistent volumes in Azure Container Services:

I’ve been playing around with SQL Server running in Kubernetes in Azure Container Services (AKS)for a while now and I think that the technology is really cool.

You can get a highly available instance of SQL Server up and running with a few lines of code! Ok, there’s a bit of setup to do but once you get that out of the way, you’re good to go.

One thing that has been missing though, is persistent storage. Any changes made to the SQL instance would have been lost if the pod that it was running in failed and was brought back up.

Until now.

Click through to learn how.  It’s certainly not trivial, but Andrew does a good job showing us the step-by-step.

Comments closed

Cycling The SQL Server Error Log

Kenneth Fisher recommends cycling the SQL Server error log regularly:

You’ll note I have it set to keep 32 files. This way I have at least a months worth of information. It’s probably more than I need, but for my personal box, that’s what I chose. For your company, it’s going to depend on your needs. You can keep up to 99 files but that seems a bit excessive to me. Of course, these files aren’t all that large (usually) so it’s up to you.

I’d go with a bit more than 32—if you have three service restarts in March, you won’t have a full month’s worth of information.  Regardless of the exact number you choose, it’s a smart idea to cycle this regularly.

Comments closed

Diagnosing TempDB Growth

Ginger Keys has a few tips around managing tempdb:

So what happened to cause this inconsistent growth and how should I fix it?

Tempdb is used (and can grow) when users explicitly create objects like temp tables, variables, cursors, or stored procedures. Tempdb is also used when the database engine creates work tables for sorts or spooling, and row versioning operations.

It’s often difficult to diagnose what happened on a SQL Server after the fact. There are many resources online to determine what is currently using tempdb but not a lot of information from processes that may have run days ago.

Read on for some tips to help manage tempdb growth better.

Comments closed

Including A Progress Bar In R

Peter Solymos has an update to his pbapply library:

The pbapply R package that adds progress bar to vectorized functions has been know to accumulate overhead when calling parallel::mclapply with forking (see this post for more background on the issue). Strangely enough, a GitHub issue held the key to the solution that I am going to outline below. Long story short: forking is no longer expensive with pbapply, and as it turns out, it never was.

H/T R-Bloggers

Comments closed

Building A Model Using SQL Server ML Services

I have a post which shows how to build a simple R model to predict demand for an item:

I am a huge fan of the Poisson distribution.  It is special in that its one parameter (lambda) represents both the mean and the variance of the distribution.  At the limit, a Poisson distribution becomes normal.  But it’s most useful in helping us pattern infrequently-occurring events.  For example, selling 3-4 watches per day.

Estimating a Poisson is also easy in R:  lambda is simply the mean of your sample counts, and there is a function called rpois() which takes two parameters:  the number of events you want to generate and the value of lambda.

So what I want to do is take my data from SQL Server, feed it into R, and return back a prediction for the next seven days.

This was a simple post, but the next two in the series will expand upon it and build out a full implementation.

Comments closed

Exploring The MNIST Dataset

David Robinson performs exploratory data analysis on the MNIST digit database:

The challenge is to classify a handwritten digit based on a 28-by-28 black and white image. MNIST is often credited as one of the first datasets to prove the effectiveness of neural networks.

In a series of posts, I’ll be training classifiers to recognize digits from images, while using data exploration and visualization to build our intuitions about why each method works or doesn’t. Like most of my posts I’ll be analyzing the data through tidy principles, particularly using the dplyr, tidyr and ggplot2 packages. In this first post we’ll focus on exploratory data analysis, to show how you can better understand your data before you start training classification algorithms or measuring accuracy. This will help when we’re choosing a model or transforming our features.

Read on for the analysis.

Comments closed

The Whys Of Azure ML Workbench

Ginger Grant explains why Azure Machine Learning Workbench exists:

Microsoft is looking for Azure Machine Learning Workbench for more than a tool to use for Machine Learning analysis. It is part of a system to manage and monitor the deployment of machine learning solutions with Azure Machine Learning Model Management. The management aspects are part of the application installation.  To install the Azure Machine Learning Workbench, the application download is available only by creating an account in Microsoft’s Azure environment, where a Machine Learning Model Management resource will be created as part of the install. Within this resource, you will be directed to create a virtual environment in Azure where you will be deploying and managing Machine Learning models.

This migration into management of machine learning components is part of a pattern first seen on the on-premises version of data science functionality.  First Microsoft helped companies manage the deployment of R code with SQL Server 2016 which includes the ability to move R code into SQL Server.  Providing this capability decreased the time it took to implement a data science solution by providing a means for the code can be deployed easily without the need for the R code to be re-written or included in another application. SQL Server 2017 expanded on this idea by allowing Python code to be deployed into SQL Server as well.  With the cloud service Model Management, Microsoft is hoping to centralize the implementation so that all Machine Learning services created can be managed in one place.

Read on for more.

Comments closed