Press "Enter" to skip to content

Author: Kevin Feasel

Installing Python Support In SQL Server

Ginger Grant has a teaser for her upcoming 24 Hours of PASS talk:

The process for using Python in SQL Server is very similar to the previous process of installing R.  Microsoft renamed R Services to Machine Learning Services, and now allows both R and Python to be installed, as shown in the screen.  Microsoft’s version of Python uses Anaconda, which is an open source analytics platform created by Continuum. This is where Python differs from other open source languages, as Continuum is providing the version of Python as it contains data science components which are not included in the standard distribution of Python. Continuum also sells an enterprise version of Anaconda, with of course more features than come with the free version. It is important to remember the python environment as you will need select the same distribution when running Python code outside of SQL Server.

Read on to see how to install Python support in SQL Server 2017 and for a few links to tools.

Comments closed

Parameter Sniffing On Conditional Statements

Kendra Little explains that SQL Server will cache parameter values for invalid statements:

The first time that dbo.ReviewFlags is executed after the database comes online, it’s with an invalid parameter, like this:

  • EXEC dbo.ReviewFlags @Flag = null;
  • GO

This is caught by the IF block, hits the RAISERROR, and goes down to the THROW block, and the output is:

  • Msg 50000, Level 11, State 1, Procedure ReviewFlags, Line 8 [Batch Start Line 70]
  • @Flag must be a value between 1 and 5

But even though SQL Server didn’t execute the SELECT statement, it still compiled it. And it also cached the plan.

Read on to understand the trouble this can cause, as well as a few ways of solving the problem.  This is a special case of parameter sniffing problems, but the solutions are the same as in the general case.

Comments closed

STOP Date Formats

Dave Mason notes that the STOPAT date option when restoring a log backup is temperamental:

There’s nothing I see in the documentation regarding the format for “time“. But there are a couple of examples, including this one:

RESTORE LOG AdventureWorks  
FROM AdventureWorksBackups  
WITH FILE=4, NORECOVERY, STOPAT = 'Apr 15, 2020 12:00 AM';

That string looks suspiciously like a US English date format. I suspect that wouldn’t work for languages that don’t recognize “Apr” as a month. And what if the date is displayed in one of the many date formats used outside of the US? Lets find out!

Dave tried 21 different date formats; click through for the results.

Comments closed

Using bsts In R

Steven L. Scott explains what the bsts package does:

Time series data appear in a surprising number of applications, ranging from business, to the physical and social sciences, to health, medicine, and engineering. Forecasting (e.g. next month’s sales) is common in problems involving time series data, but explanatory models (e.g. finding drivers of sales) are also important. Time series data are having something of a moment in the tech blogs right now, with Facebook announcing their “Prophet” system for time series forecasting (Taylor and Letham 2017), and Google posting about its forecasting system in this blog (Tassone and Rohani 2017).

This post summarizes the bsts R package, a tool for fitting Bayesian structural time series models. These are a widely useful class of time series models, known in various literatures as “structural time series,” “state space models,” “Kalman filter models,” and “dynamic linear models,” among others. Though the models need not be fit using Bayesian methods, they have a Bayesian flavor and the bsts package was built to use Bayesian posterior sampling.

If you’re looking for time series models, this looks like a good one.

Comments closed

Data Cleaning Tips

Michael Grogan has a few tips for data cleaning with R:

6. Delete observations using head and tail functions

The head and tail functions can be used if we wish to delete certain observations from a variable, e.g. Sales. The head function allows us to delete the first 30 rows, while the tail function allows us to delete the last 30 rows.

When it comes to using a variable edited in this way for calculation purposes, e.g. a regression, the as.matrix function is also used to convert the variable into matrix format:

Salesminus30days←head(Sales,-30)
X1=as.matrix(Salesminus30days)
X1

Salesplus30days<-tail(Sales,-30)
X2=as.matrix(Salesplus30days)
X2

Some of these tips are for people familiar with Excel but fairly new to R.  These also use the base library rather than the tidyverse packages (e.g., using merge instead of dplyr’s join or as.date instead of lubridate).  You may consider that a small negative, but if it is, it’s a very small one.

Comments closed

Useful dplyr Functions

S. Richter-Walsh explains seven important dplyr functions with plenty of examples:

There are many useful functions contained within the dplyr package. This post does not attempt to cover them all but does look at the major functions that are commonly used in data manipulation tasks. These are:

select() 
filter()
mutate() 
group_by() 
summarise()
arrange() 
join()

The data used in this post are taken from the UCI Machine Learning Repository and contain census information from 1994 for the USA. The dataset can be used for classification of income class in a machine learning setting and can be obtained here.

That’s probably the bare minimum you should know about dplyr, but knowing just these seven can make data analysis in R much easier.

Comments closed

Streaming ETL Using CDC And Event Hub

Rolf Tesmer combines Change Data Capture and Event Hubs to build a streaming ETL solution:

The solution picks up the SQL data changes from the CDC Change Tracking system tables, creates JSON messages from the change rows, and then posts the message to an Azure Event Hub.  Once landed in the Event Hub an Azure Stream Analytics (ASA) Job distributes the changes into the multiple outputs.

What I found pretty cool was that I could transmit SQL delta changes from source to target in as little as 5 seconds end to end!

There are a bunch of steps, but the end result is worth it.

Comments closed

Check Where That Backup’s Restoring To

Shane O’Neill “has a friend” who learned an important lesson about the database restore GUI:

GUIs are good for….

…discovery.

They give you the option to script out the configurations you have chosen. If my friend had chosen to script out the restore, rather then clicking “OK” to run it, maybe he would have caught this mistake when reviewing it – rather than overwriting the Live database with 2 week old data and spending a weekend in the office with 3 colleagues fixing it.

Plus if you ever want to ensure that you know something, try and script it out from scratch.

Read the whole thing; good thing that totally didn’t happen to Shane and was just his friend!

Comments closed

Save Early, Save Often

Kenneth Fisher relays an important life lesson:

So years and years ago, when I was in college, one of my favorite classes was Assembly Language. We were working with Mac Assembly in case anyone is interested (yes I used a Mac at school, one of the big ones that had the monitor built into it). Somewhere around week three or four, we were supposed to print something to the screen. I spent several hours (this was only my second programming class so even Hello World was a challenge) and got my program ready to test. It worked! Sort of.

Hello World was written to the top of the screen! Then a second or so later the bottom half of the screen turned into random ASCII garbage. Then a second or so later the computer rebooted. Well, that’s not good. Time to debug!

So the computer comes back up, I take a look, and I don’t have ANY code. I hadn’t saved (and this was long enough ago there was no auto-save). I had to start ALL over again. In the end, I did manage to re-write my code, got it working and even got an A. I also learned that I needed to save my work before running it. Well, learned my lesson for the first time (of many).

I have attempted to put a sanguine spin on this mishap, based on something Phil Factor once wrote:  if you throw away (or lose) the code the first time around, the second time you write it, the code will probably be better.  This is because the first time you’re writing a set of code, you’re trying to force the pieces together and get the code working; the second time around, you have a working algorithm in mind, so the code will likely be much cleaner.

1 Comment

Changing A Large Table’s Clustered Index

Nate Johnson explains how to change the clustered index on a very large table:

I call this the “setup, dump, & swap”.  Essentially we need to create an empty copy of the table, with the desired index(es), dump all the data into it, and then swap it in.  There are couple ways you can do this, but it boils down to the same basic premise: It’s “better” (probably not in terms of speed, but definitely in terms of efficiency and overhead) to fill this new copy of the table & its indexes, than it is to build the desired index on the existing table.

As Nate notes, “very large” here will depend on your environment, but this is a useful technique because the old table can be live until the moment of the swap.  As Nate writes this, I’m actually in the middle of one of these sorts of swaps—one that will take a week or two to finish due to pacing.

Comments closed