DBA Salary Calculations

Eugene Meidinger takes a whack at the data professional salary survey:

So I’m using something called a multiple linear regression to make a formula to predict your salary based on specific variables. Unfortunately, the highest Coefficient of Determination (or R2) I’ve been able to get is 0.37. Which means, as far as I understand it, that at most the model explains 37% of the variation.

Additionally the spread on the results isn’t great either. The standard deviation, a measure of spread, is about $25,000 on the original subset of data. Which means we’d expect 68% to be within +/- $25,000 of the average and 95% to be within +/- $50,000 of the average. So what happens when we apply our model?

Read on for Eugene’s early findings and a roadmap for additional posts.

Related Posts

Using The Azure Data Science VM With GPUs

Jennifer Marsman has some tips and tricks around using the Azure Data Science Virtual Machine on an instance running with GPU support: To get GPU support, you need both hardware with GPUs in a datacenter, as well as the right software – namely, a virtual machine image that includes GPU drivers so you can use […]

Read More

Visualizing Model Input Effects

Ilknur Kaynar Kabul shows us how to use partial dependence plots and individual conditional expectation plots to view the specific effect of an input variable on a model: A partial dependence (PD) plot depicts the functional relationship between a small number of input variables and predictions. They show how the predictions partially depend on values […]

Read More


January 2018
« Dec Feb »