The Year Of The Data Engineer

Alex Woodie points out that data science also requires data engineers:

The shortage of data scientists – those triple-threat types who possess advanced statistics, business, and coding skills – has been well-documented over the years. But increasingly, businesses are facing a shortage of another key individual on the big data team who’s critical to achieving success – the data engineer.

Data engineers are experts in designing, building, and maintaining the data-based systems in support of an organization’s analytical and transactional operations. While they don’t boast the quantitative skills that a data scientist would use to, say, build a complex machine learning model, data engineers do much of the other work required to support that data science workload, such as:

  • Building data pipelines to collect data and move it into storage;

  • Preparing the data as part of an ETL or ELT process;

  • Stitching the data together with scripting languages;

  • Working with the DBA to construct data stores;

  • Ensuring the data is ready for use;

  • Using frameworks and microservices to serve data.

Read the whole thing.  My experience is that most shops looking to hire a data scientist really need to get data engineers first; otherwise, you’re wasting that high-priced data scientist’s time.  The plus side is that if you’re already a database developer, getting into data engineering is much easier than mastering statistics or neural networks.

Related Posts

Azure Data Factory and Schema Drift

Mark Kromer walks us through two techniques we can use in Azure Data Factory to deal with schema drift: Azure Data Factory’s Mapping Data Flows have built-in capabilities to handle complex ETL scenarios that include the ability to handle flexible schemas and changing source data. We call this capability “schema drift“. When you build transformations […]

Read More

Troubleshooting AWS Database Migration Service Errors

Samir Behara takes us through troubleshooting AWS Database Migration Service issues: For troubleshooting any issues with AWS DMS, it is necessary to have logs enabled. The DMS logs would typically give a better picture and helps find errors or warnings that would indicate the root cause of the failure. If the logs are not available […]

Read More

Categories

February 2018
MTWTFSS
« Jan Mar »
 1234
567891011
12131415161718
19202122232425
262728