The Costs of Specialization within Data Science

Eric Colson argues in favor of data science generalists rather than specialists:

But the goal of data science is not to execute. Rather, the goal is to learn and develop profound new business capabilities. Algorithmic products and services like recommendations systemsclient engagement banditsstyle preference classificationsize matchingfashion design systemslogistics optimizersseasonal trend detection, and more can’t be designed up-front. They need to be learned. There are no blueprints to follow; these are novel capabilities with inherent uncertainty. Coefficients, models, model types, hyper parameters, all the elements you’ll need must be learned through experimentation, trial and error, and iteration. With pins, the learning and design are done up-front, before you produce them. With data science, you learn as you go, not before you go.

In the pin factory, when learning comes first we do not expect, nor do we want, the workers to improvise on any aspect the product, except to produce it more efficiently. Organizing by function makes sense since task specialization leads to process efficiencies and production consistency (no variations in the end product).

I think this article captures the downside risk of specialization, but not the downside risks of generalization: some people simply aren’t very good at some things, leading to huge amounts of technical debt down the road or failing a project due to the lack of requisite knowledge or skills. To give a personal example, I have a generalist team, but I still control the data flows (at the very least doing thorough code reviews of any database changes), my application specialist controls app architecture, my statistician reviews algorithms, etc. I don’t claim that this is the best strategy, but a group of pure generalists will have their own set of problems too.

Related Posts

Linear Programming in Python

Francisco Alvarez shows us an example of linear programming in Python: The first two constraints, x1 ≥ 0 and x2 ≥ 0 are called nonnegativity constraints. The other constraints are then called the main constraints. The function to be maximized (or minimized) is called the objective function. Here, the objective function is x1 + x2. Two classes of […]

Read More

Exploratory Data Analysis with inspectdf

Laura Ellis continues a dive into Exploratory Data Analysis, this time using the inspectdf package: I like this package because it’s got a lot of functionality and it’s incredibly straightforward to use. In short, it allows you to understand and visualize column types, sizes, values, value imbalance & distributions as well as correlations. Better yet, […]

Read More


March 2019
« Feb Apr »