Data Engineering Remains As Important As Ever

Prashanth Southekal has good news for ETL developers:

While many companies have embarked on data analytics initiatives, only a few have been successful. Studies have shown that over 70% of data analytics programs fail to realize their full potential and over 80% of the digital transformation initiatives fail. While there are many reasons that affect successful deployment of data analytics, one fundamental reason is lack of good quality data. However, many business enterprises realize this and invest considerable time and effort in data cleansing and remediation; technically known as data engineering. It is estimated that about 60 to 70% of the effort in data analytics is on data engineering. Given that data quality is an essential requirement for analytics, there are 5 key reasons on why data analytics is heavy on data engineering.

1.Different systems and technology mechanisms to integrate data.

Business systems are designed and implemented for a purpose; mainly for recording business transactions. The mechanisms for data capture in Business systems such as ERP is batch/discrete data while in the SCADA/IoT Field Systems it is for continuous/time-series data. This means that these business systems store diverse data types caused by the velocity, volume, and variety dimensions in the data. Hence the technology (including the database itself) to capture data is varied and complex.  And when you are trying to integrate data from these diverse systems from different vendors, the metadata model varies resulting in data integration challenges.

That 60-70% on data engineering is probably a moderate underestimate.

Related Posts

Kafka And The Differing Aims Of Data Professionals

Kai Waehner argues that there is an impedence mismatch between data engineers, data scientists, and ML production engineers: Data scientists love Python, period. Therefore, the majority of machine learning/deep learning frameworks focus on Python APIs. Both the stablest and most cutting edge APIs, as well as the majority of examples and tutorials use Python APIs. […]

Read More

Saving An ADF Pipeline As A Template

Rayis Imayev shares with us how you can save an Azure Data Factory pipeline as a template: Azure Data Factory (ADF) provides you with a framework for creating data transformation solutions in the Microsoft cloud environment. Recently introduced Template Gallery for ADF pipelines can speed up this development process and provide you with helpful information to create additional activity […]

Read More

Categories

July 2018
MTWTFSS
« Jun Aug »
 1
2345678
9101112131415
16171819202122
23242526272829
3031