Most NoSQL solutions are kind of caching, with persistent data store, with or without replication support. One of the key issue in production environment is using consistent hashing to avoid cache failure.
I talked to a friend days ago about memcached deployment problem, he asked question about what to do with adding new memcached node to expand capacity, to avoid loading bunch of data from database to cache nodes. I told him I don’t have any experience, but if I encounter this problem, I will try to restart memcached client machines one by one, to use new configuration, so to avoid put massive load to database, also I will think about changing hashing function of memcached client, try to maximize entries that can keep partition unchanged.
It turned out my second idea is correct (I should have read all those articles before talking to him :P). There are couple of articles discussing about this issue, and the good start point, of course, is wikipedia.
I tried libketama, seems pretty good in term of retention rate. I did some tests that could be (sort of) real world use case. Say, we have 4 weak (512M) nodes and want to replace them with all new nodes with double capacity (1G), I’m going to add new nodes to the cluster one by one, and then remove old nodes one by one, and here are what I got:
cluster | capacity | capacity changed |
key moved |
---|---|---|---|
4x512M | 2G | 0% | 0% |
4x512M 1x1G |
3G | 50% | 40% |
4x512M 2x1G |
4G | 33% | 30% |
4x512M 3x1G |
5G | 25% | 25% |
4x512M 4x1G |
6G | 20% | 20% |
3x512M 4x1G |
5.5G | 8% | 12% |
2x512M 4x1G |
5G | 9% | 13% |
1x512M 4x1G |
4.5G | 10% | 18% |
4x1G | 4G | 11% | 19% |
relatively, percentage of keys got moved to other partitions is close to capacity changes, which means it is close to the best number.
And key distribution is pretty even (capacity/utilization, node #1~#4 are 512M, #5~38 are 1G):
node #1 | node #2 | node #3 | node #4 | node #5 | node #6 | node #7 | node #8 |
---|---|---|---|---|---|---|---|
25.0%
25.6% |
25.0%
21.7% |
25.0%
24.7% |
25.0%
28.0% |
– | – | – | – |
16.7%
16.9% |
16.7%
15.2% |
16.7%
19.0% |
16.7%
17.7% |
33.3%
31.1% |
– | – | – |
12.5%
13.5% |
12.5%
10.8% |
12.5%
13.7% |
12.5%
12.7% |
25.0%
24.5% |
25.0%
24.8% |
– | – |
10.0%
10.9% |
10.0%
9.4% |
10.0%
11.0% |
10.0%
8.3% |
20.0%
19.6% |
20.0%
20.0% |
20.0%
20.9% |
– |
8.3%
8.9% |
8.3%
8.3% |
8.3%
8.1% |
8.3%
7.0% |
16.7%
16.7% |
16.7%
17.1% |
16.7%
17.9% |
16.7%
16.1% |
– | 9.1%
9.0% |
9.1%
9.6% |
9.1%
8.2% |
18.2%
17.5% |
18.2%
18.3% |
18.2%
19.8% |
18.2%
17.6% |
– | – | 10.0%
9.7% |
10.0%
8.9% |
20.0%
20.3% |
20.0%
20.5% |
20.0%
21.9% |
20.0%
18.6% |
– | – | – | 11.1%
9.2% |
22.2%
22.3% |
22.2%
22.2% |
22.2%
25.2% |
22.2%
21.1% |
– | – | – | – | 25.0%
24.2% |
25.0%
24.5% |
25.0%
27.2% |
25.0%
24.1% |
I still need to try out fnv to see if it has better distribution and/or less key shakiness, from the article above it was said at least it has better performance.
2 Responses to “NoSQL – start with consistent hashing”
Sorry, the comment form is closed at this time.
[…] some major programming languages (C/C++, PHP/Python, Java). I also need to do test similar to what I’ve done and understand how it affects […]
[…] best one from their perspective, however, just curious how things are going and also to practice consistent hashing, I wrote a simple perl […]