Traditionally data has been residing in silos across the organization and the ecosystem in which it operations (external data). That’s a challenge: you can’t combine the right data to succeed in a big data project if that data is a bit everywhere in and out of the cloud. This is where the idea – and reality – of (big) data lakes comes from.
A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data.
Although the data lakes serve as a central ingestion point for the plethora of data that organizations seek to gather and mine, it still has various limitations or challenges.
Software is made up of tradeoffs. What you gain by dumping relational structure, you lose in dumping relational structure (and this applies in the opposite direction as well).