Descriptive Statistics With SQL Server And R

Kevin Feasel

2016-07-19

R

Mala Mahadevan digs into descriptive statistics:

With R integration into SQL Server 2016 we can pull an R script and integrate it rather easily. I will be covering all 3 approaches. I am using a small dataset – a single table with 915 rows, with a SQL Server 2016 installation and R Studio. The complexities of doing this type of analysis in the real world with bigger datasets involve setting various options for performance and dealing with memory issues – because R is very memory intensive and single threaded.

My table and the data it contains can be created with scripts here. For this specific post I used just one column in the table – age. For further posts I will be using the other fields such as country and gender.

Mala compares T-SQL versus R for calculating minimum, maximum, mean, and mode.  She wraps the post up by showing how to call her R code via T-SQL using SQL Server R Services.

Related Posts

Using Cohen’s D for Experiments

Nina Zumel takes us through Cohen’s D, a useful tool for determining effect sizes in experiments: Cohen’s d is a measure of effect size for the difference of two means that takes the variance of the population into account. It’s defined asd = | μ1 – μ2 | / σpooledwhere σpooled is the pooled standard deviation over both cohorts. […]

Read More

Comparing Iterator Performance in R

Ulrik Stervbo has a performance comparison for for, apply, and map functions in R: It is usually said, that for– and while-loops should be avoided in R. I was curious about just how the different alternatives compare in terms of speed. The first loop is perhaps the worst I can think of – the return vector is […]

Read More

Categories

July 2016
MTWTFSS
« Jun Aug »
 123
45678910
11121314151617
18192021222324
25262728293031