Mar 052010
 

Sigh …

I got these two names by searching online and seems they are great products that can definitely solve some of headache I’m currently facing, however …

Then I found out that they are not open source at all, people just publish those articles for … show up?

Sigh …

Dec 092009
 

It seems current design cannot bypass the bottleneck from relationship, I think I need to re-think about the design.

It was said doing denormalization at the write time is a good way, as these nosql data stores are really fast on write. However, it may be too much to do it. My current scenario is every user has 10~100 buddies, if I do the denormalization during the write time, it still not clear to me how to set up a schema to fit it.

I found I’m still sort of in RDBMS world, need to jump into this nosql universe as soon as possible :P.

Dec 082009
 

Testing a prototype that uses Cassandra as the back end storage, the simple application is doing user authentication stuffs, it logs in and then get user’s profile and then show details on the web page.

I hit performance problem with buddy related operation – every user may have 0~20 buddies, I want to show each buddy’s last login time on the result page, and actually I’ve retrieve everything for those buddies. The most direct implementation as I did first, is using user object to get data back, obviously this is not good as for every user object, client needs to access Cassandra cluster to get data back, the TCP round trip would be a pain.

Then I added something to user object’s constructor, which load all buddies info in one shot (Cassandra’s multiget_slice API), things are getting better but this doesn’t seems reasonable to me as for most time, we don’t need buddy info (such as authentication), and getting buddies info back is just a waste of time.

So I added a new method to the user class, called load_buddies, this will load buddies info on-demand. This makes authentication pretty fast, but still keep the ability of loading buddies info in batch mode.

After all these the performance is … still not good, my test case is one login failure every ten requests, and for successfully logged user, I should buddy id and last access time, and also change the user’s last login time. The performance, with my current setting, the worst response time is about a second, while 90% request were done in less than 600ms.

There must be something can be tuned, though VM could be the reason of slowness. I will check following stuffs:

  • Apache HTTPd configuration, it seems prefork is performing better than worker, there may be more can be tuned include both HTTPd and wsgi
  • Python class optimization, I will review the implementation of user class, as I don’t want to make user class too complicated to be used
  • Cassandra performance, actually this is what I’m worrying about, as during the tests, Cassandra boxes’ CPU utilization is about 80% – 70% on user, 10% on sys, roughly, it could be the bottleneck

Without the buddy operation everything’s fine – the worst response time is about 600 ms while 90% requests are below 400ms. Relationship is a pain, it’s the bottleneck, but in this social era, there is no web application can live without relationship …

BTW, my testing environment:

  • Test client running on PowerBook, using ab – I will check if there is anything else can be useful
  • Servers are all running on same physical box controlled by proxmox, this includes a web server, a LVS director (to load balance Cassandra nodes), and 3 Cassandra nodes
  • The server box uses Ethernet, PowerBook is on wireless. I don’t think there is any issue for this as connect time is pretty low.
Nov 252009
 

Playing with LVS – so that I don’t have to connect to individual Cassandra server.

What I planned for LVS:

  • 192.168.1.99 will be VIP
  • f5 (192.168.1.205) will be LVS director … you are right, its name is f5 ๐Ÿ˜‰
  • f1 (192.168.1.101) to f4 (192.168.1.104) will be real server
  • will do DR mode (I think it is called single-B on most L4 switches …)

The configuration is actually pretty simple, as long as you get it right, on LVS director (f5):

  • Configure VIP to eth0:0 as an alias, netmask should be 255.255.255.255, broadcast is VIP itself
  • Add following rules with ipvsadm (yea … you need to install this package)
    -A -t 192.168.1.99:9160 -s wlc
    -a -t 192.168.1.99:9160 -r 192.168.1.101:9160 -g -w 1
    -a -t 192.168.1.99:9160 -r 192.168.1.102:9160 -g -w 1
    -a -t 192.168.1.99:9160 -r 192.168.1.103:9160 -g -w 1
    -a -t 192.168.1.99:9160 -r 192.168.1.104:9160 -g -w 1
  • Start LVS (after restart network to make sure VIP is effective) by “ipvsadm –start-daemon master”
  • If you want to stop LVS you can do “ipvsadm –stop-daemon master”

Now let’s turn to real servers – all real servers (f1~f4) are doing the same:

  • Add these lines to /etc/sysctl.conf and then do a “sysctl -p”
    net.ipv4.conf.dummy0.arp_ignore = 1
    net.ipv4.conf.dummy0.arp_announce = 2
    net.ipv4.conf.all.arp_ignore = 1
    net.ipv4.conf.all.arp_announce = 2
  • Configure VIP to dummy0 as an alias, netmask should be 255.255.255.255, broadcast is VIP itself
  • Change ThriftAddress in storage-conf.xml for Cassandra, to 0.0.0.0 so that Thrift serves on all interfaces
  • Remember to restart Cassandra so to make new configuration takes effect

Now, launch your favorite client, connecting to 192.168.1.99:9160, you should get everything as if connect to individual server.

Nov 242009
 

“not fast” is better than “slow”, so I think I’m making progress, better than before.

Updates:

  • I moved to proxmox which gives me better VM performance so that I can have more VMs for my test, it did take me some time to dig out a usable solution. Now I’m running 4 VMs so I can test fail-over and bootstrap etc.
  • I moved to Python since PHP is not that popular now especially in all these new technology, I’m a code-by-sample guy, so while the whole world is writing codes in Java, Ruby, and Python, I don’t have many choices. I picked Python because I don’t want to run things like Tomcat, and built-in web servers does not convince me (I’m talking about Ruby).
  • I’ve done some simple tests but dealing with columns, etc., the test environment gives me reasonable performance number – 8ms per read/write.
  • I’m still learning Python and its web stuffs, seems not that hard to catch up though. I’m using web.py which seems to be the lightest framework, I may be wrong but I don’t want to dig in more at this moment.

To-do list:

  • I need to figure out if Ubuntu is still the way to go for my virtualization environment, I’m worrying proxmox is not a major player in this area so it may ruin my long term plan.
  • I need to find out if there is any other better HTTP server, “better” here means: light, support wsgi.
  • I’m going to compose some test scripts dealing with super column, which is what I need to use for the statistic project.
  • Revisit original design, both schema and work flow may have some changes.

I would like to say, everything is on the track, though I’m not that fast. I will post updates after this thanks giving as I doubt if I will have time coding during the holiday.

Nov 192009
 

I’ve made some progress, I want to write it down here so that I can follow up tomorrow (actually, today):

  • Python is not that easy, but it is not that difficult, I’ve decided to use wsgi plus web.py to do my web development, I think wsgi is the right way to go, but web.py is still a question mark – I picked it up just because it is simple
  • Tuned Apache configuration to make it support wsgi/web.py, actually I was thinking of finding something else which will be lighter, I still need to do some more research on that but since I’m using wsgi so I don’t think changing web server will affect anything, other than deployment.
  • I found a place to host subversion, freely. Using version control can easily track changes, and remote repository will make sure my stuffs are safe. Free service does not guarantee 100% reliability, but thinking of I have local copy already, it’s acceptable
Nov 172009
 

Here is the deal – I decided to drop PHP and moving to Python, so that I can spend less time on dealing with less-supported PHP (in this nosql wave), I’ve removed PHP from all dev/test environments and wish I won’t come back later.

Actually I’ve made PHP works, but I’m just not feeling well as not many people are using PHP and it seems hard to seek help whenever needed. Also, seems setting up Python with Apache (through wsgi) is not that difficult. It could be a good chance to lear Python as well, though I did some PyS60 a while ago (for Jabber on E90).

BTW, I’ve upgraded all Fedora instances (3 of them) to Fedora 12, so far so good.

Nov 132009
 

Finally I had to build my own instead doing package installation, and building Thrift is not that difficult (after you go through it once …).

You definitely need to read requirements for building Thrift, but things are not quite clear at the first glance, dependency list is not clear, so here are what I installed before “./bootstrap.sh; ./configure; make; sudo make install”, note that this is the package list for Fedora, but it should be similar to Ubuntu:

  • subversion
  • gcc-g++
  • java-1.6.0-openjdk-devel
  • perl-devel
  • python-devel
  • php-devel
  • mono-devel
  • boost-devel
  • libtool
  • bison
  • flex
  • perl-ExtUtils-MakeMaker

After all these installed, it will work like a chime.

Nov 132009
 

I’ve deployed Cassandra to my development environment, running on 4 servers with replica 2. I picked the number 4 and 2 because it’s more like a real world thing, and it is the requirement from my friend. I can test fail-over etc later on.

I’ve also composed some scripts to do service stuffs – the script I composed can start/stop/restart Cassandra gracefully, it can also tell status of the node and the cluster, it’s a simple Shell script, I will make it a service under Fedora, and a init.d script under Ubuntu (I’m running only these two platform now). Cassandra was upgrade from 0.4.1 to 0.4.2 days ago, and I used that as a chance to test my deployment stuffs, seems pretty good. I think I just need to be careful with 0.4.x => 0.5.x upgrade since it may break compatibility on configuration and command line.

I’ve converted my PowerBook to a dedicated client machine running Fedora … it’s a pretty old machine and seems Apple does not want to roll out new software (such as JDK 1.6) for it, so I did some survey around and picked Fedora (Ubuntu is not ppc friendly – the support is community based).

And finally I have Thrift/PHP up and running. At the very beginning I was thinking I should use Java as the client but later one found that I have no idea how to develop Java based web application, and since Thrift mentioned it supports all C/C++, Java, Perl, PHP, Ruby, Python pretty well, I should just pick my favorite language to do the test and then let real clients pick what theyย  want to use (where is my client, BTW? ๐Ÿ™ ).

And yes, I confirmed the schema (though Cassandra is schema-less thing), I’m going to test the schema with PHP client today. After that I will have to find a place to hold all my codes/configuration, etc, in a subversion, and based on what I found so far, github.com is the best candidate.

Will update here today or tomorrow.

Nov 092009
 

One of my friends asked me if this service is doable:

  • Every hour multiple machines (clients) will send ~1M actions records to the service
  • Each action contains: user id, action, action result, start time, end time, and couple of user profile keys
  • The service should be able to deliver reports for any given time frame (minutes), within 15 min after the period finishes. For example, report for 3pm~3:30pm should be available by 3:45pm
  • Reports include: how many users did a specific action with specific result (during that time frame), how many actions does a specific user did, and top number of actions taken by others users who did the same action as the user (hard to understand, but think about Amazon, “Customers bought this item also bought…)
  • Most important thing is: all these requirements should be done on no more than 4 machines, include redundancy, which means should be done on 2 or 3 machines

I have to say, this is a pretty common requirements for online services (shopping, search, gaming,… could be anything), and if I can make it and make the solution linear, then it will be pretty much interesting (let’s say, 40 machines supporting 10M actions per hour, a lot already).

I will do some research (the nosql stuffs could well fit into this one), and post thinking/design here.