Vinod Kumar Vavilapalli describes the modern data lake:
During the past few years though, end-to-end business use-cases have evolved to another level.
- The end-to-end business problems are now mostly solved by multiple applications working together.
- As the platform matured, users have increasingly started wanting to solely focus on the business application layers, and getting impatient to get on with developing their main business-logic.
- However, YARN, and for that matter any other related platform, hasn’t catered to this evolving need, leaving the users to unwillingly get involved in the painstaking details of wiring applications together, keeping them up, manually scaling them as need arises etc.
Manual plumbing of all these different colored services in tiresome! Further, there is a clear need for seamless aggregate deployment, lifecycle management and application wireup. This is the gap that needs to be bridged between what these end-to-end business use-cases need from the platform and what the platform offers today. If these features are provided, then the business use cases authors can singularly focus on the business logic.
This is a higher-level “where are we at?” kind of post which could be helpful if you’re new to the data lake concept.