Oct 232009
 

Most NoSQL solutions are kind of caching, with persistent data store, with or without replication support. One of the key issue in production environment is using consistent hashing to avoid cache failure.

I talked to a friend days ago about memcached deployment problem, he asked question about what to do with adding new memcached node to expand capacity, to avoid loading bunch of data from database to cache nodes. I told him I don’t have any experience, but if I encounter this problem, I will try to restart memcached client machines one by one, to use new configuration, so to avoid put massive load to database, also I will think about changing hashing function of memcached client, try to maximize entries that can keep partition unchanged.

It turned out my second idea is correct (I should have read all those articles before talking to him :P). There are couple of articles discussing about this issue, and the good start point, of course, is wikipedia.

I tried libketama, seems pretty good in term of retention rate. I did some tests that could be (sort of) real world use case. Say, we have 4 weak (512M) nodes and want to replace them with all new nodes with double capacity (1G), I’m going to add new nodes to the cluster one by one, and then remove old nodes one by one, and here are what I got:

cluster capacity capacity
changed
key moved
4x512M 2G 0% 0%
4x512M
1x1G
3G 50% 40%
4x512M
2x1G
4G 33% 30%
4x512M
3x1G
5G 25% 25%
4x512M
4x1G
6G 20% 20%
3x512M
4x1G
5.5G 8% 12%
2x512M
4x1G
5G 9% 13%
1x512M
4x1G
4.5G 10% 18%
4x1G 4G 11% 19%

relatively, percentage of keys got moved to other partitions is close to capacity changes, which means it is close to the best number.

And key distribution is pretty even (capacity/utilization, node #1~#4 are 512M, #5~38 are 1G):

node #1 node #2 node #3 node #4 node #5 node #6 node #7 node #8
25.0%

25.6%

25.0%

21.7%

25.0%

24.7%

25.0%

28.0%

16.7%

16.9%

16.7%

15.2%

16.7%

19.0%

16.7%

17.7%

33.3%

31.1%

12.5%

13.5%

12.5%

10.8%

12.5%

13.7%

12.5%

12.7%

25.0%

24.5%

25.0%

24.8%

10.0%

10.9%

10.0%

9.4%

10.0%

11.0%

10.0%

8.3%

20.0%

19.6%

20.0%

20.0%

20.0%

20.9%

8.3%

8.9%

8.3%

8.3%

8.3%

8.1%

8.3%

7.0%

16.7%

16.7%

16.7%

17.1%

16.7%

17.9%

16.7%

16.1%

9.1%

9.0%

9.1%

9.6%

9.1%

8.2%

18.2%

17.5%

18.2%

18.3%

18.2%

19.8%

18.2%

17.6%

10.0%

9.7%

10.0%

8.9%

20.0%

20.3%

20.0%

20.5%

20.0%

21.9%

20.0%

18.6%

11.1%

9.2%

22.2%

22.3%

22.2%

22.2%

22.2%

25.2%

22.2%

21.1%

25.0%

24.2%

25.0%

24.5%

25.0%

27.2%

25.0%

24.1%

I still need to try out fnv to see if it has better distribution and/or less key shakiness, from the article above it was said at least it has better performance.

  2 Responses to “NoSQL – start with consistent hashing”

  1. […] some major programming languages (C/C++, PHP/Python, Java). I also need to do test similar to what I’ve done and understand how it affects […]

  2. […] best one from their perspective, however, just curious how things are going and also to practice consistent hashing, I wrote a simple perl […]

Sorry, the comment form is closed at this time.