Building A Python Project Template

Henk Griffioen shows how to create a standardized project in Python, focusing on data science scenarios:

Project structures often organically grow to suit people’s needs, leading to different project structures within a team. You can consider yourself lucky if at some point in time you find, or someone in your team finds, a obscure blog post with a somewhat sane structure and enforces it in your team.

Many years ago I stumbled upon ProjectTemplate for R. Since then I’ve tried to get people to use a good project structure. More recently DrivenData (what’s in a name?) released their more generic Cookiecutter Data Science.

The main philosophies of those projects are:

  • A consistent and well-organized structure allows people to collaborate more easily.

  • Your analyses should be reproducible and your structure should enable that.

  • A projects starts from raw data that should never be edited; consider raw data immutable and only edit derived sources.

This is a set of prescriptions and focuses on the phase before the project actually kicks off.

Related Posts


John Mount explains the vtreat package that he and Nina Zumel have put together: When attempting predictive modeling with real-world data you quicklyrun into difficulties beyond what is typically emphasized in machine learning coursework: Missing, invalid, or out of range values. Categorical variables with large sets of possible levels. Novel categorical levels discovered during test, cross-validation, or […]

Read More

Wrapping Up A Data Science Project

I have finished my series on launching a data science project.  First, I have a post on deploying models as microservices: The other big shift is a shift away from single, large services which try to solve all of the problems.  Instead, we’ve entered the era of the microservice:  a small service dedicated to providing […]

Read More


March 2017
« Feb Apr »