The thing about this common pattern is that we need to wait for all of these back-end requests to complete before returning to the user. So improving the average latency of these requests doesn’t help us one little bit.
Since each request must wait for all of its queries to complete, the overall request latency is defined to be the latency of the request’s slowest query. Even if almost all backends have low tail latencies, the tail latency of the maximum of several queries could be high.
(See ‘The Tail at Scale’).
The user can easily see P99 latency or greater.
Techniques to mitigate tail latencies include making redundant requests, clever use of scheduling, auto-scaling and capacity provisioning, and approximate computing. Robin Hood takes a different (complementary) approach: use the cache to improve tail latency!
Robin Hood doesn’t necessarily allocate caching resources to the most popular back-ends, instead, it allocates caching resources to the backends (currently) responsible for the highest tail latency.
This is a great review of an interesting algorithm.