In data analytics, fast query performance is more of a result than a guarantee. What’s more important than the result itself is the architectural design and mechanism that enables quick performance. This is exactly what this post is about. I will put you into context with a typical use case of Apache Doris, an open-source MPP-based analytic database.
The user, in this case, is an all-category Q&A website. As a billion-dollar listed company, they have their own data management platform. What Doris does is support the data filtering, packaging, analyzing, and monitoring workloads of that platform. Based on their huge data size, the user demands quick data loading and quick response to queries.
This sounds a lot like sharding of the data, where you segregate data for a particular customer/entity into its own database (and possibly instance), with the exception that queries are expected to go over a number of shards rather than focus on a single one.