Team Data Science Process Updates

David Smith announces updates to the Team Data Science Process:

It’s been over a year since we first introduced introduced the Team Data Science Process (TDSP). The data, technology and practices behind Data Science continue to evolve, and the TDSP has evolved in parallel. Over the past year, several new facets have been added, including:

  • The IDEAR (Interactive Data Exploration, Analysis and Reporting) framework, an open source extension to R and Python designed to standardize the process of data exploration and reporting;

  • Guidance for use of Spark 2.0, including an end-to-end Spark v2.0 walkthrough;

  • Guidance for use of in-database Python with SQL Server, including an end-to-end in-database Python tutorial;

Click through for more changes, as well as links to further resources.

Related Posts

P-Hacking and Multiple Comparison Bias

Patrick David has a great article on hypothesis testing, p-hacking, and multiple comparison bias: The most important part of hypothesis testing is being clear what question we are trying to answer. In our case we are asking:“Could the most extreme value happen by chance?”The most extreme value we define as the greatest absolute AMVR deviation from […]

Read More

Feature And Text Classification Using Naive Bayes In R

I wrap up my series on the Naive Bayes class of algorithms, finally writing some code along the way: Now we’re going to look at movie reviews and predict whether a movie review is a positive or a negative review based on its words. If you want to play along at home, grab the data set, […]

Read More

Categories

November 2017
MTWTFSS
« Oct Dec »
 12345
6789101112
13141516171819
20212223242526
27282930