Press "Enter" to skip to content

Using a Spark Listener

Bipin Patwardhan shares with us an event ingestion engine for Apache Spark:

In the last quarter of 2019, I developed a meta-data driven, ingestion engine using Spark. The framework /library has multiple patterns to cater to multiple source and destination combinations. For example, two patterns are available for loading flat files to cloud storage (one to load data to AWS S3 and another to load data to Azure Blob).

As data loading philosophies have changed from Extract-Transform-Load (ETL) to Extract-Load-Transform (ETL), such a framework is very useful, as it reduces the time needed to set up ingestion jobs.

Is anyone else getting Integration Services or Informatica flashbacks? Because I sure am.