The Year Of The Data Engineer

Alex Woodie points out that data science also requires data engineers:

The shortage of data scientists – those triple-threat types who possess advanced statistics, business, and coding skills – has been well-documented over the years. But increasingly, businesses are facing a shortage of another key individual on the big data team who’s critical to achieving success – the data engineer.

Data engineers are experts in designing, building, and maintaining the data-based systems in support of an organization’s analytical and transactional operations. While they don’t boast the quantitative skills that a data scientist would use to, say, build a complex machine learning model, data engineers do much of the other work required to support that data science workload, such as:

  • Building data pipelines to collect data and move it into storage;

  • Preparing the data as part of an ETL or ELT process;

  • Stitching the data together with scripting languages;

  • Working with the DBA to construct data stores;

  • Ensuring the data is ready for use;

  • Using frameworks and microservices to serve data.

Read the whole thing.  My experience is that most shops looking to hire a data scientist really need to get data engineers first; otherwise, you’re wasting that high-priced data scientist’s time.  The plus side is that if you’re already a database developer, getting into data engineering is much easier than mastering statistics or neural networks.

Related Posts


John Mount explains the vtreat package that he and Nina Zumel have put together: When attempting predictive modeling with real-world data you quicklyrun into difficulties beyond what is typically emphasized in machine learning coursework: Missing, invalid, or out of range values. Categorical variables with large sets of possible levels. Novel categorical levels discovered during test, cross-validation, or […]

Read More

Wrapping Up A Data Science Project

I have finished my series on launching a data science project.  First, I have a post on deploying models as microservices: The other big shift is a shift away from single, large services which try to solve all of the problems.  Instead, we’ve entered the era of the microservice:  a small service dedicated to providing […]

Read More


February 2018
« Jan Mar »