Posts tagged: development

Tiny computers

I’m thinking of getting my home development environment another plan – instead of running couple of VMs on a single big box, maybe I can use couple of small computers to get things done.

This could be possible after checked online, though price may be higher. I’m still looking for single board computer/client computer that are as small as a CD box (or a little bit bigger), does not have to come with a disk as I can run NFS for it, and it does not have to be x86 based as I can run Linux on almost all sort of CPUs.

Will see.

Go on making it a mess

Here are current setup:

  • data server d1: Fedora 12
  • data server d2: openSUSE 11.2
  • data server d3: CentOS 5.4
  • shared host f5: Debian 5.0.3, this is the machine act as NFS server, LVS director, and login bridge box
  • client machine c1: Ubuntu 9.10, this is the pure client host, doing all development and initiate testing traffic
  • client machine c2: Gentoo 1.12.13, actually this is the web server running apache+wsgi

d1~d3 and f5 are running on the dedicated box with ProxMox, while c1 and c2 are running on a Windows Vista machine with VirtualBox.

Making it a mess :D

I was running everything on Fedora (7 VMs), but steadily, I changed some of them to different distros … just for fun.

Now I have two Fedora 12,  one CentOS 5.4, one openSuSE 11.2, one Debian 5.03, and two Ubuntu 9.10. I will think about converting one of the Fedora machine and one of the Ubuntu machine to something else, but haven’t decided yet.

Plan for the new year

2009 is a tough year to everybody, I just wish world becomes better in year 2010.

My plan for year 2010 – let’s make this a leisure talk, I don’t want to promise anything since this is a way too dynamic time that everything can be changed in days. However, if things moving smoothly, I wish I can:

Try to find something interesting to do as my job … no detail yet. I wish I can still be here with my current company, but just in case it becomes impossible, I may think about some different industry.

Next comes to my mind is re-arrange development environment at home, whenever I said “re-arrange” I actually meant … more :D . In a ideal world (though the world can never be ideal) I will retire the P4 box, use a iMac replace it, and get a 8 cores machines as my dedicated testing environment which is currently running on a dual core desktop, and then I will upgrade the 4-core desktop to Windows 7, and retires the PowerBook and the Asus laptop (1.1G …) as well. In this way I will have only 5 machines at home, but with 8 more cores.

If I can get a 8 core box, I will run 10~15 VMs on the machine, so that I can testing all sort of cluster ideas. Also, I’m going to use different distro (again), include CentOS/Fedora, Debian/Ubuntu.

I wish I can get familiar with Python on web development, I should evaluate a framework though I don’t think I’m going to use it anyway, but if there is any chance, I should build up a site with the framework, to make it my RAD solution for web development :) . As for Python, I also need to try out doing extension with C/C++, so that in case there is any performance bottleneck comes from Python I can get over it.

I don’t know if I will have interest do desktop development again, if I have it could be the time to try Delphi again, but I can imagine after couple of days excitement, I will move to Visual C++ because of Delphi’s buggy system (please, Delphi, please make yourself better, or just workable). I don’t have any specific project so far, but if I can at least re-write the P2P webcam thing. Java can also be an alternative for desktop development, I will definitely NOT use Java on Web as it is just horrible, but it may fit into desktop, who knows.

Hobby study – get to try Ada and Lisp (yea, again). I will check to see if I can make Ada a server side programming language, and try to do some small client side applications. I don’t have confidence on working on these two, but whenever I have time, I will always do something.

There are something purely for learning, as I don’t want to put money on trying them out – all those cloud services like Amazon S3, Ec2, Google App Engine, and maybe something from Microsoft, etc. I need to at least understand how these things are working (i.e. operation related stuffs), so that once I need them, I won’t run into blind. I may also try out Ubuntu’s private cloud (is this its name?) and see how it can fit into an enterprise environment.

I don’t want to put any personal/family/friend plan here, it’s too hard to plan and also involves privacy issue :D , but roughly speaking, I wish I can spend more time with people around me, make them happy. I want to get some amount of money to pay off part of my mortgage, or get a new car, but I never had a good plan of how to getting money other than salary, so it could be just a wish.

Relationship is the bottleneck

Testing a prototype that uses Cassandra as the back end storage, the simple application is doing user authentication stuffs, it logs in and then get user’s profile and then show details on the web page.

I hit performance problem with buddy related operation – every user may have 0~20 buddies, I want to show each buddy’s last login time on the result page, and actually I’ve retrieve everything for those buddies. The most direct implementation as I did first, is using user object to get data back, obviously this is not good as for every user object, client needs to access Cassandra cluster to get data back, the TCP round trip would be a pain.

Then I added something to user object’s constructor, which load all buddies info in one shot (Cassandra’s multiget_slice API), things are getting better but this doesn’t seems reasonable to me as for most time, we don’t need buddy info (such as authentication), and getting buddies info back is just a waste of time.

So I added a new method to the user class, called load_buddies, this will load buddies info on-demand. This makes authentication pretty fast, but still keep the ability of loading buddies info in batch mode.

After all these the performance is … still not good, my test case is one login failure every ten requests, and for successfully logged user, I should buddy id and last access time, and also change the user’s last login time. The performance, with my current setting, the worst response time is about a second, while 90% request were done in less than 600ms.

There must be something can be tuned, though VM could be the reason of slowness. I will check following stuffs:

  • Apache HTTPd configuration, it seems prefork is performing better than worker, there may be more can be tuned include both HTTPd and wsgi
  • Python class optimization, I will review the implementation of user class, as I don’t want to make user class too complicated to be used
  • Cassandra performance, actually this is what I’m worrying about, as during the tests, Cassandra boxes’ CPU utilization is about 80% – 70% on user, 10% on sys, roughly, it could be the bottleneck

Without the buddy operation everything’s fine – the worst response time is about 600 ms while 90% requests are below 400ms. Relationship is a pain, it’s the bottleneck, but in this social era, there is no web application can live without relationship …

BTW, my testing environment:

  • Test client running on PowerBook, using ab – I will check if there is anything else can be useful
  • Servers are all running on same physical box controlled by proxmox, this includes a web server, a LVS director (to load balance Cassandra nodes), and 3 Cassandra nodes
  • The server box uses Ethernet, PowerBook is on wireless. I don’t think there is any issue for this as connect time is pretty low.

Thread … sigh

It seems by default wsgi use 15 threads per daemon process, it sounds like a cool feature at the very beginning, but later it turned to nightmare to me.

I’m working on testing user authentication system with Cassandra as backend, obviously establishing connection to Cassandra for every request doesn’t make sense at all, so I’m trying to do some persistent connection stuffs.

The quick and easy solution for me is, establishing the connection in class level, so at least requests handled by same process can share the connection. So I created a class member to hold the connection, and __init__ of every class instance, I “ping” the connection by describe_keyspace (assume it access meta data only which would be fast). If there is anything wrong during the “ping”, I will shutdown current connection and establish a new one.

Too bad with default wsgi configuration – it runs in thread model, so by the time I close/re-open connection for the class (process), there could be other threads using the connection. It seems I can only run with some kind lock mechanism to avoid this from happening, which is tough and frankly speaking, I don’t know how to make it efficient.

So I changed wsgi configuration, make it one thread per process, by doing so I can be sure while I’m doing the “ping” there is only one request for this class (process), so I can safely do whatever I want, as within the class (process) there is no parallel execution.

The performance is pretty good, I will tune to see how many processes I should run for the wsgi daemon, but with 5 processes, I can reach 190 ms average/250 ms maximum for 10 concurrent clients, 380/450 for 20 concurrent clients, and550/650 for 30, pretty linear in term of average response time.

It seems, to me at least, thread is not a great idea always, I cannot tell how much performance gain can it gives me, but it at least makes things complicated.

I’m not that fast any more

“not fast” is better than “slow”, so I think I’m making progress, better than before.

Updates:

  • I moved to proxmox which gives me better VM performance so that I can have more VMs for my test, it did take me some time to dig out a usable solution. Now I’m running 4 VMs so I can test fail-over and bootstrap etc.
  • I moved to Python since PHP is not that popular now especially in all these new technology, I’m a code-by-sample guy, so while the whole world is writing codes in Java, Ruby, and Python, I don’t have many choices. I picked Python because I don’t want to run things like Tomcat, and built-in web servers does not convince me (I’m talking about Ruby).
  • I’ve done some simple tests but dealing with columns, etc., the test environment gives me reasonable performance number – 8ms per read/write.
  • I’m still learning Python and its web stuffs, seems not that hard to catch up though. I’m using web.py which seems to be the lightest framework, I may be wrong but I don’t want to dig in more at this moment.

To-do list:

  • I need to figure out if Ubuntu is still the way to go for my virtualization environment, I’m worrying proxmox is not a major player in this area so it may ruin my long term plan.
  • I need to find out if there is any other better HTTP server, “better” here means: light, support wsgi.
  • I’m going to compose some test scripts dealing with super column, which is what I need to use for the statistic project.
  • Revisit original design, both schema and work flow may have some changes.

I would like to say, everything is on the track, though I’m not that fast. I will post updates after this thanks giving as I doubt if I will have time coding during the holiday.

Late night … some progress

I’ve made some progress, I want to write it down here so that I can follow up tomorrow (actually, today):

  • Python is not that easy, but it is not that difficult, I’ve decided to use wsgi plus web.py to do my web development, I think wsgi is the right way to go, but web.py is still a question mark – I picked it up just because it is simple
  • Tuned Apache configuration to make it support wsgi/web.py, actually I was thinking of finding something else which will be lighter, I still need to do some more research on that but since I’m using wsgi so I don’t think changing web server will affect anything, other than deployment.
  • I found a place to host subversion, freely. Using version control can easily track changes, and remote repository will make sure my stuffs are safe. Free service does not guarantee 100% reliability, but thinking of I have local copy already, it’s acceptable

Changes …

Here is the deal – I decided to drop PHP and moving to Python, so that I can spend less time on dealing with less-supported PHP (in this nosql wave), I’ve removed PHP from all dev/test environments and wish I won’t come back later.

Actually I’ve made PHP works, but I’m just not feeling well as not many people are using PHP and it seems hard to seek help whenever needed. Also, seems setting up Python with Apache (through wsgi) is not that difficult. It could be a good chance to lear Python as well, though I did some PyS60 a while ago (for Jabber on E90).

BTW, I’ve upgraded all Fedora instances (3 of them) to Fedora 12, so far so good.

How to build Thrift

Finally I had to build my own instead doing package installation, and building Thrift is not that difficult (after you go through it once …).

You definitely need to read requirements for building Thrift, but things are not quite clear at the first glance, dependency list is not clear, so here are what I installed before “./bootstrap.sh; ./configure; make; sudo make install”, note that this is the package list for Fedora, but it should be similar to Ubuntu:

  • subversion
  • gcc-g++
  • java-1.6.0-openjdk-devel
  • perl-devel
  • python-devel
  • php-devel
  • mono-devel
  • boost-devel
  • libtool
  • bison
  • flex
  • perl-ExtUtils-MakeMaker

After all these installed, it will work like a chime.

WordPress Themes