In the past we used ETL techniques purely within the data-warehousing and analytic space. But, if one considers why and what ETL is doing, it is actually a lot more applicable as a broader concept.
- Extract: Data is available from a source system
- Transform: We want to filter, cleanse or otherwise enrich this source data
- Load: Make the data available to another application
There are two key concepts here:
- Data is created by an application, and we want it to be available to other applications
- We often want to process the data (for example, cleanse and apply business logic to it) before it is used
Thinking about many applications being built nowadays, particularly in the microservices and event-driven space, we recognize that what they do is take data from one or more systems, manipulate it and then pass it on to another application or system. For example, a fraud detection service will take data from merchant transactions, apply a fraud detection model and write the results to a store such as Elasticsearch for review by an expert. Can you spot the similarity to the above outline? Is this a microservice or ETL process?
Things like this are reason #1 why I expect data platform jobs (administrator and developer) to be around decades from now. The set of tools expand, but the nature of the job remains similar.