Team Data Science Process Updates

David Smith announces updates to the Team Data Science Process:

It’s been over a year since we first introduced introduced the Team Data Science Process (TDSP). The data, technology and practices behind Data Science continue to evolve, and the TDSP has evolved in parallel. Over the past year, several new facets have been added, including:

  • The IDEAR (Interactive Data Exploration, Analysis and Reporting) framework, an open source extension to R and Python designed to standardize the process of data exploration and reporting;

  • Guidance for use of Spark 2.0, including an end-to-end Spark v2.0 walkthrough;

  • Guidance for use of in-database Python with SQL Server, including an end-to-end in-database Python tutorial;

Click through for more changes, as well as links to further resources.

Related Posts

Lasso and Ridge Regression in Python

Kristian Larsen shows off a few regression techniques using Python: Variables with a regression coefficient equal to zero after the shrinkage process are excluded from the model. Variables with non-zero regression coefficients variables are most strongly associated with the response variable. Therefore, when you conduct a regression model it can be helpful to do a […]

Read More

Using Cohen’s D for Experiments

Nina Zumel takes us through Cohen’s D, a useful tool for determining effect sizes in experiments: Cohen’s d is a measure of effect size for the difference of two means that takes the variance of the population into account. It’s defined asd = | μ1 – μ2 | / σpooledwhere σpooled is the pooled standard deviation over both cohorts. […]

Read More

Categories

November 2017
MTWTFSS
« Oct Dec »
 12345
6789101112
13141516171819
20212223242526
27282930