Caching Strategy

Kevin Gessner explains some caching concepts used at Etsy:

A major drawback of modulo hashing is that the size of the cache pool needs to be stable over time.  Changing the size of the cache pool will cause most cache keys to hash to a new server.  Even though the values are still in the cache, if the key is distributed to a different server, the lookup will be a miss.  That makes changing the size of the cache pool—to make it larger or for maintenance—an expensive and inefficient operation, as performance will suffer under tons of spurious cache misses.

For instance, if you have a pool of 4 hosts, a key that hashes to 500 will be stored on pool member 500 % 4 == 0, while a key that hashes to 1299 will be stored on pool member 1299 % 4 == 3.  If you grow your cache by adding a fifth host, the cache pool calculated for each key may change. The key that hashed to 500 will still be found on pool member 500 % 5 == 0, but the key that hashed to 1299 be on pool member 1299 % 5 == 4. Until the new pool member is warmed up, your cache hit rate will suffer, as the cache data will suddenly be on the ‘wrong’ host. In some cases, pool changes can cause more than half of your cached data to be assigned to a different host, slashing the efficiency of the cache temporarily. In the case of going from 4 to 5 hosts, only 20% of cache keys will be on the same host as before!

It’s interesting reading.

Related Posts

Event Sourcing On Kafka

Adam Warski shows how you can use Apache Kafka as your event sourcing data source: There’s a number of great introductory articles, so this is going to be a very brief introduction. With event sourcing, instead of storing the “current” state of the entities that are used in our system, we store a stream of events that relate to these […]

Read More

Avoid Scalar Functions In Computed Columns

Daniel Hutmacher shows why you should not include scalar functions inside computed column definitions: Scalar functions can be a real headache when you’re performance tuning. For one, they don’t parallelize. In fact, if you use a scalar function in a computed column, it will prevent any query that uses that table from going parallel – even if you […]

Read More


December 2017
« Nov Jan »