Linear Regression In SQL

Phil Factor shows how to generate a quick linear regression using SQL, Powershell, and Gnuplot:

It looks a bit like someone has fired a shotgun at a wall but is there a relationship between the two variables? If so, what is it? There seems to be a weak positive linear relationship between the two variables here so we can be fairly confident of plotting a trendline.

Here is the data, and we will proceed to calculate the slope and intercept. We will also calculate the correlation.

It’s good to know that this is possible, but I’d switch to R or Python long before.

Related Posts

Calculating TF-IDF Using Apache Spark

Arseniy Tashoyan shows us how to calculate Term Frequency-Inverse Document Frequency using Apache Spark: TF-IDF is used in a large variety of applications. Typical use cases include: Document search. Document tagging. Text preprocessing and feature vector engineering for Machine Learning algorithms. There is a vast number of resources on the web explaining the concept itself […]

Read More

Randomization With NEWID()

Michael J. Swart tests whether ORDER BY NEWID() produces a biased result: One of his articles, Visualizing Algorithms has some thoughts on shuffling at He says that sorting using a random comparator is a rotten way to shuffle things. Not only is it inefficient, but the resulting shuffle is really really biased. He goes on to visualize that […]

Read More


April 2017
« Mar May »