Testing a prototype that uses Cassandra as the back end storage, the simple application is doing user authentication stuffs, it logs in and then get user’s profile and then show details on the web page.
I hit performance problem with buddy related operation – every user may have 0~20 buddies, I want to show each buddy’s last login time on the result page, and actually I’ve retrieve everything for those buddies. The most direct implementation as I did first, is using user object to get data back, obviously this is not good as for every user object, client needs to access Cassandra cluster to get data back, the TCP round trip would be a pain.
Then I added something to user object’s constructor, which load all buddies info in one shot (Cassandra’s multiget_slice API), things are getting better but this doesn’t seems reasonable to me as for most time, we don’t need buddy info (such as authentication), and getting buddies info back is just a waste of time.
So I added a new method to the user class, called load_buddies, this will load buddies info on-demand. This makes authentication pretty fast, but still keep the ability of loading buddies info in batch mode.
After all these the performance is … still not good, my test case is one login failure every ten requests, and for successfully logged user, I should buddy id and last access time, and also change the user’s last login time. The performance, with my current setting, the worst response time is about a second, while 90% request were done in less than 600ms.
There must be something can be tuned, though VM could be the reason of slowness. I will check following stuffs:
- Apache HTTPd configuration, it seems prefork is performing better than worker, there may be more can be tuned include both HTTPd and wsgi
- Python class optimization, I will review the implementation of user class, as I don’t want to make user class too complicated to be used
- Cassandra performance, actually this is what I’m worrying about, as during the tests, Cassandra boxes’ CPU utilization is about 80% – 70% on user, 10% on sys, roughly, it could be the bottleneck
Without the buddy operation everything’s fine – the worst response time is about 600 ms while 90% requests are below 400ms. Relationship is a pain, it’s the bottleneck, but in this social era, there is no web application can live without relationship …
BTW, my testing environment:
- Test client running on PowerBook, using ab – I will check if there is anything else can be useful
- Servers are all running on same physical box controlled by proxmox, this includes a web server, a LVS director (to load balance Cassandra nodes), and 3 Cassandra nodes
- The server box uses Ethernet, PowerBook is on wireless. I don’t think there is any issue for this as connect time is pretty low.