Jan 192010
 

I setup a testing environment on couple of company boxes to see how Cassandra performs with real machines (real here means powerful enough to be a data node), here are details of the environment:

  • Two client nodes, one server nodes, all are RHEL 4.x. I use two clients nodes as I found that during the performance test, single client machine is unable to generate enough load
  • All three machines are 8 cores/16G memory (well, memory is not a big deal for my tests)
  • Running Cassandray 0.5.0 RC3 (built from svn last night)
  • Client is using Python

Here is the graph for simple request (single key lookup):

It seems the result is pretty encouraging – query per second of the server is growing almost linearly, at about 5,000 QPS, over CPU utilization is still under 40% (25% user, 12% sys), I cannot get more client boxes to test, but if it goes this way, and let’s make 80% is threshold of CPU utilization, then this kind of box can handle 10K QPS, roughly, with latency at around 3ms.

Note that CPU utilization, QPS per client, and latency is not quite clear as the overall QPS is too high, but you can get some ideas from next graph …

Here is the graph for application (login, which will do one user lookup, and then 10~100 user lookups, each lookup is to get one buddy’s information):

The result is kind of worrying me, since the CPU utilization is 70% already (45% user and 25% sys), it seems 200 QPS is what the cluster can provide. However, thinking of the login operation is doing way too many table lookups (average 55 lookups per login), so just matches the simple lookup we discussed above (10K QPS per box), while latency is at around 80ms.

Actually, 20% sys is pretty bad, means the kernel is busy switching (I didn’t check vmstat during that time, but this is a reasonable guess), but again, this may be reasonable since the machine is handling 16 active clients who are sending bunch of requests, while it has only 8 physical cores so context switching is unavoidable.

Since everything’s linear, I can assume 4 cores boxes can offer 5,000 QPS with reasonable latency. I will do some similar tests with MySQL and memcached, and I will do similar test with multiple data nodes as well, since I got impression that multiple data nodes is far slower than single node (inter-node communication?).

Dec 082009
 

Testing a prototype that uses Cassandra as the back end storage, the simple application is doing user authentication stuffs, it logs in and then get user’s profile and then show details on the web page.

I hit performance problem with buddy related operation – every user may have 0~20 buddies, I want to show each buddy’s last login time on the result page, and actually I’ve retrieve everything for those buddies. The most direct implementation as I did first, is using user object to get data back, obviously this is not good as for every user object, client needs to access Cassandra cluster to get data back, the TCP round trip would be a pain.

Then I added something to user object’s constructor, which load all buddies info in one shot (Cassandra’s multiget_slice API), things are getting better but this doesn’t seems reasonable to me as for most time, we don’t need buddy info (such as authentication), and getting buddies info back is just a waste of time.

So I added a new method to the user class, called load_buddies, this will load buddies info on-demand. This makes authentication pretty fast, but still keep the ability of loading buddies info in batch mode.

After all these the performance is … still not good, my test case is one login failure every ten requests, and for successfully logged user, I should buddy id and last access time, and also change the user’s last login time. The performance, with my current setting, the worst response time is about a second, while 90% request were done in less than 600ms.

There must be something can be tuned, though VM could be the reason of slowness. I will check following stuffs:

  • Apache HTTPd configuration, it seems prefork is performing better than worker, there may be more can be tuned include both HTTPd and wsgi
  • Python class optimization, I will review the implementation of user class, as I don’t want to make user class too complicated to be used
  • Cassandra performance, actually this is what I’m worrying about, as during the tests, Cassandra boxes’ CPU utilization is about 80% – 70% on user, 10% on sys, roughly, it could be the bottleneck

Without the buddy operation everything’s fine – the worst response time is about 600 ms while 90% requests are below 400ms. Relationship is a pain, it’s the bottleneck, but in this social era, there is no web application can live without relationship …

BTW, my testing environment:

  • Test client running on PowerBook, using ab – I will check if there is anything else can be useful
  • Servers are all running on same physical box controlled by proxmox, this includes a web server, a LVS director (to load balance Cassandra nodes), and 3 Cassandra nodes
  • The server box uses Ethernet, PowerBook is on wireless. I don’t think there is any issue for this as connect time is pretty low.
Nov 202009
 

I recently spent lot of time on reading articles from Linux Magazine, as it always introduce pretty new (or not new, but less known) technology and products, this time it is Proxmox, an open source virtualization product.

I’m running VMWare Go at home at this moment, but I failed to solve the license issue and every 60 days I have to re-install everything. I guess I got the wrong ISO, so what I installed was actually for vSphere, but I just don’t want to spend too much time on digging it out as obviously VMWare does not want people get the right solution easily.

Xen is another story, at least to me it is not easy to use, maybe next version will be easier (I should take a try either but I lack of machines …). Also, I still have this impression that running Ubuntu with Xen is painful as it is, kind of, tightly bundled with RedHat distro.

Now here comes Proxmox which seems promising, I will take a try today (may be weekend as well) and then if it works I will stick with it, but if not … I will try out Xen.

Let’s see.

Oct 232009
 

I’ve read too many articles talking about this NoSQL stuffs, now I have to have a plan to proceed (with what? :-W).

First of all, I’m going to remove MySQL and OpenLDAP installation in my testing environment :P. MySQL is kind of slow to me though I have pretty much experience in setting it up, include replication, etc, and I will check to see if all applications can be done in a key-value based data store (check below). OpenLDAP is another story, I still haven’t figure out how to set it up with replication – last time I tried was 2.2/2.3, but 2.4 introduces a whole new approach to do replication and I think I’m going to leave it untouched for now. Note that I still need to come back to LDAP later on since it is still perfect solution for running corporate-like application, such as what I did couple of times before – integrate mail, IM, wiki, blog, etc together with single user-id.

OK, back to NoSQL, couple of things to do:

  1. Consistent hashing, I still need to read all those articles and try out different implementation, I don’t think I will compose my own, but I need something can work on Linux and Windows (OSX? Don’t think so), and support some major programming languages (C/C++, PHP/Python, Java). I also need to do test similar to what I’ve done and understand how it affects deployment.
  2. Try out different engines, most likely I won’t try out things too fancy (read it “complicated”), for example, I will prefer Redis over memcachedb just because memcachedb’s replication is not that simple to me, I believe anything complicate in setup will be a headache in maintenance. I will also skip those so-called document store/graph store unless they can support simple key-value store in same performance (then those features will be a nice add-on). I don’t have the list so far, but I will get one in the coming weekend. Things to be tested include installation, replication, fail-over, backup and recovery, monitoring, etc. Also programming language supported will be another important factor, I wish a similar list to item #1.
  3. Application … I’m going to conclude “traditional” web features that involves data store, and check to see how to implement them in distributed key-value data store. For example, user registration, login, edit preference/profile is one of the fundamental features, and buddy related operation (add as buddy, blacklist, check online status, notify buddy for event/be notified by buddy) is another one. Things current in my mind include message feature (internal/external IM/mail), post features (threaded post like forum, vote/survey may be in this category as well), and maybe some search features. I don’t think I can come up with a full list in the coming days, but I will keep posting here.

This is pretty much what’s in my mind. All these stuffs are seems to be new and almost none of them are well packaged, so after 4~5 years of using yum/apt, now I need to do what I used to do – build everything from scratch, if I have time, I will compose some packages so to ease my deployment.

Oct 232009
 

Most NoSQL solutions are kind of caching, with persistent data store, with or without replication support. One of the key issue in production environment is using consistent hashing to avoid cache failure.

I talked to a friend days ago about memcached deployment problem, he asked question about what to do with adding new memcached node to expand capacity, to avoid loading bunch of data from database to cache nodes. I told him I don’t have any experience, but if I encounter this problem, I will try to restart memcached client machines one by one, to use new configuration, so to avoid put massive load to database, also I will think about changing hashing function of memcached client, try to maximize entries that can keep partition unchanged.

It turned out my second idea is correct (I should have read all those articles before talking to him :P). There are couple of articles discussing about this issue, and the good start point, of course, is wikipedia.

I tried libketama, seems pretty good in term of retention rate. I did some tests that could be (sort of) real world use case. Say, we have 4 weak (512M) nodes and want to replace them with all new nodes with double capacity (1G), I’m going to add new nodes to the cluster one by one, and then remove old nodes one by one, and here are what I got:

cluster capacity capacity
changed
key moved
4x512M 2G 0% 0%
4x512M
1x1G
3G 50% 40%
4x512M
2x1G
4G 33% 30%
4x512M
3x1G
5G 25% 25%
4x512M
4x1G
6G 20% 20%
3x512M
4x1G
5.5G 8% 12%
2x512M
4x1G
5G 9% 13%
1x512M
4x1G
4.5G 10% 18%
4x1G 4G 11% 19%

relatively, percentage of keys got moved to other partitions is close to capacity changes, which means it is close to the best number.

And key distribution is pretty even (capacity/utilization, node #1~#4 are 512M, #5~38 are 1G):

node #1 node #2 node #3 node #4 node #5 node #6 node #7 node #8
25.0%

25.6%

25.0%

21.7%

25.0%

24.7%

25.0%

28.0%

16.7%

16.9%

16.7%

15.2%

16.7%

19.0%

16.7%

17.7%

33.3%

31.1%

12.5%

13.5%

12.5%

10.8%

12.5%

13.7%

12.5%

12.7%

25.0%

24.5%

25.0%

24.8%

10.0%

10.9%

10.0%

9.4%

10.0%

11.0%

10.0%

8.3%

20.0%

19.6%

20.0%

20.0%

20.0%

20.9%

8.3%

8.9%

8.3%

8.3%

8.3%

8.1%

8.3%

7.0%

16.7%

16.7%

16.7%

17.1%

16.7%

17.9%

16.7%

16.1%

9.1%

9.0%

9.1%

9.6%

9.1%

8.2%

18.2%

17.5%

18.2%

18.3%

18.2%

19.8%

18.2%

17.6%

10.0%

9.7%

10.0%

8.9%

20.0%

20.3%

20.0%

20.5%

20.0%

21.9%

20.0%

18.6%

11.1%

9.2%

22.2%

22.3%

22.2%

22.2%

22.2%

25.2%

22.2%

21.1%

25.0%

24.2%

25.0%

24.5%

25.0%

27.2%

25.0%

24.1%

I still need to try out fnv to see if it has better distribution and/or less key shakiness, from the article above it was said at least it has better performance.

Sep 252009
 

There was not much progress with performance test as mentioned here.

I first did test with LDAP vs. memcached, then found there was not much difference between those two, which is suspicious, so I started running test to compare pure database and memcached solution, which ideally should show lot of difference, but I failed.

I still haven’t figured out what was the real cause, it seems VM is not that good to do this kind of test – thinking of limited resource (CPU, mem, network, disk, etc), all VMs will compete each other. Again, this is my guess and I will try to see what would be the best solution to get performance done in VM environment.

Even if I failed, it shows that VM will not be good for performance testing, nice try ๐Ÿ™‚ .

Sep 102009
 

I think I’m a project/feature killer.

Couple of months back I was asked to test performance of feature GMF, and since the performance was really bad (sorry, 10x times slower), so we have to remove that fancy feature from two major systems’ roadmap;

Couple of weeks ago I was asked to test performance of feature TRF, the resource utilization was not that cool (20~40% more resource), so we have to cancel the (again, fancy) feature from all roadmaps;

Couple of hours ago I started to test the new (sure again, fancy) build system, seems the program does not run at all. I don’t want to predict anything but based on my past record … you know it.

Aug 272009
 

I don’t run auto-update for all those machines, so I have to run these every couple of days:

ssh -t ubuntu “sudo apt-get update && sudo apt-get -y dist-upgrade”
ssh -t debian “sudo apt-get update && sudo apt-get -y dist-upgrade”
ssh -t fedora “sudo yum -y update”
ssh -t centos “sudo yum -y update”
ssh -t opensuse “sudo zypper -n update ”
ssh -t mandriva “sudo urpmi –auto-update –auto-orphans –force”

Jun 192009
 

Testing smilies.

I think I got it fixed – the problem is that “/” is used as delimiter of regular expression, and if “/” is part of a smiley’s pattern, the whole thing is broken.

So I’m defining a delimiter variable in wp-includes/functions.php, in function smilies_init(), and use a character that can never be used (well, sort of) – I’m using “\001” for now. Then replace all occurrences of “/” with $delimiter while composing $wp_smiliessearch, and that’s it.

Only open issue, which I don’t think it is related, is that two continuous #:-s will have only one shows up, still debugging … Continue reading »