Data Lake 3.0

Vinod Kumar Vavilapalli describes the modern data lake:

During the past few years though, end-to-end business use-cases have evolved to another level.

  • The end-to-end business problems are now mostly solved by multiple applications working together.
  • As the platform matured, users have increasingly started wanting to solely focus on the business application layers, and getting impatient to get on with developing their main business-logic.
  • However, YARN, and for that matter any other related platform, hasn’t catered to this evolving need, leaving the users to unwillingly get involved in the painstaking details of wiring applications together, keeping them up, manually scaling them as need arises etc.

Manual plumbing of all these different colored services in tiresome! Further, there is a clear need for seamless aggregate deployment, lifecycle management and application wireup. This is the gap that needs to be bridged between what these end-to-end business use-cases need from the platform and what the platform offers today. If these features are provided, then the business use cases authors can singularly focus on the business logic.

This is a higher-level “where are we at?” kind of post which could be helpful if you’re new to the data lake concept.

Related Posts

Kafka Partitioning Strategies

Amy Boyle shares some thoughts on Kafka partitioning strategy: If you have enough load that you need more than a single instance of your application, you need to partition your data. The producer clients decide which topic partition data ends up in, but it’s what the consumer applications will do with that data that drives […]

Read More

Single-Node Hadoop 3 Installation

Mark Litwintschik has a fairly simple guide for installing Hadoop 3 on a single node for testing: This post is meant to help people explore Hadoop 3 without feeling the need they should be using 50+ machines to do so. I’ll be using a fresh installation of Ubuntu 16.04.2 LTS on a single computer. The […]

Read More


March 2017
« Feb Apr »