Simon Whiteley looks at the current status of the Lakehouse model:
We have discussed in the past this idea of the lakehouse, the aspirational target of many analytics platforms these days of combining the huge power and potential of data lakes with the rigour, reliability and concurrency of a data warehouse. It’s an interesting concept but has, in the past, been firmly an aspiration.
In the world without lakehouses, we often see the “Modern Data Warehouse”, this two-phased approach to providing a holistic platform – we load our early data into a lake where we shape it and massage it into an understandable state. It is here we perform data science, exploratory data analysis, early sight analytics prototyping and various other functions that don’t quite fit into a data warehouse… but then we load our data into a relational store for serving to the business. This is where we can meet their demands for a rich SQL environment, auditable data models and rigorous change procedures. Essentially, we store data twice so that we can achieve the best of both worlds.
Definitely read Simon’s take on it. My take is that the Lakehouse concept will start to be useful to specific companies in about 2-3 years, as I don’t think the performance is there today.