Caching Strategy

Kevin Gessner explains some caching concepts used at Etsy:

A major drawback of modulo hashing is that the size of the cache pool needs to be stable over time.  Changing the size of the cache pool will cause most cache keys to hash to a new server.  Even though the values are still in the cache, if the key is distributed to a different server, the lookup will be a miss.  That makes changing the size of the cache pool—to make it larger or for maintenance—an expensive and inefficient operation, as performance will suffer under tons of spurious cache misses.

For instance, if you have a pool of 4 hosts, a key that hashes to 500 will be stored on pool member 500 % 4 == 0, while a key that hashes to 1299 will be stored on pool member 1299 % 4 == 3.  If you grow your cache by adding a fifth host, the cache pool calculated for each key may change. The key that hashed to 500 will still be found on pool member 500 % 5 == 0, but the key that hashed to 1299 be on pool member 1299 % 5 == 4. Until the new pool member is warmed up, your cache hit rate will suffer, as the cache data will suddenly be on the ‘wrong’ host. In some cases, pool changes can cause more than half of your cached data to be assigned to a different host, slashing the efficiency of the cache temporarily. In the case of going from 4 to 5 hosts, only 20% of cache keys will be on the same host as before!

It’s interesting reading.

Related Posts

Power BI Architecture Diagram V4

Dustin Ryan has a new version of the Power BI Architecture Diagram: First and most importantly, I updated the Power BI logo in the diagram to the latest version of the logo! Secondly, I included Power BI Dataflows in the diagram tagged #6. Power BI Dataflows are used to ingest, transform, integrate, and enrich big […]

Read More

Kafka And The Differing Aims Of Data Professionals

Kai Waehner argues that there is an impedence mismatch between data engineers, data scientists, and ML production engineers: Data scientists love Python, period. Therefore, the majority of machine learning/deep learning frameworks focus on Python APIs. Both the stablest and most cutting edge APIs, as well as the majority of examples and tutorials use Python APIs. […]

Read More


December 2017
« Nov Jan »