Shankar Selvam discusses one company’s tool for bringing relational data into a data lake:
The next step in building this pipeline is to configure the sink or destination for the imported data. Hydrator provides capabilities to store data in time-partitioned directories via a built-in CDAP Dataset called Time-partitioned File Set. Once the data is stored in the fileset, CDAP automatically adds a partition which can be queried using Hive.
In this use case we will configure a Time-partitioned File Set that stores data in Avro format by usingTPFSAvro as the sink.
I like the fact that there’s a UI for this. Between this tool and NiFi, the Hadoop ecosystem is getting some tools to make data migration easier to understand, and I think that will help adoption.