The Year Of The Data Engineer

Alex Woodie points out that data science also requires data engineers:

The shortage of data scientists – those triple-threat types who possess advanced statistics, business, and coding skills – has been well-documented over the years. But increasingly, businesses are facing a shortage of another key individual on the big data team who’s critical to achieving success – the data engineer.

Data engineers are experts in designing, building, and maintaining the data-based systems in support of an organization’s analytical and transactional operations. While they don’t boast the quantitative skills that a data scientist would use to, say, build a complex machine learning model, data engineers do much of the other work required to support that data science workload, such as:

  • Building data pipelines to collect data and move it into storage;

  • Preparing the data as part of an ETL or ELT process;

  • Stitching the data together with scripting languages;

  • Working with the DBA to construct data stores;

  • Ensuring the data is ready for use;

  • Using frameworks and microservices to serve data.

Read the whole thing.  My experience is that most shops looking to hire a data scientist really need to get data engineers first; otherwise, you’re wasting that high-priced data scientist’s time.  The plus side is that if you’re already a database developer, getting into data engineering is much easier than mastering statistics or neural networks.

Related Posts

Testing Spatial Equilibrium Concepts With tidycensus

Ignacio Sarmiento Barbieri walks us through the concept of spatial equilibrium and tests using data from the tidycensus package: Let’s take the model to the data and reproduce figures 2.1. and 2.2 of “Cities, Agglomeration, and Spatial Equilibrium”. The focus are two cities, Chicago and Boston. These cities are chosen because both differ in how easy […]

Read More

Interacting With SQL Server From Pandas

Tomaz Kastrun shows how to use pyodbc to interact with a SQL Server database from Pandas: In the SQL Server Management Studio (SSMS), the ease of using external procedure sp_execute_external_script has been (and still will be) discussed many times. But the reason for this short blog post is the fact that, changing Python environments using Conda package/module management within Microsoft […]

Read More

Categories

February 2018
MTWTFSS
« Jan Mar »
 1234
567891011
12131415161718
19202122232425
262728