Posts tagged: python

Relationship is the bottleneck

Testing a prototype that uses Cassandra as the back end storage, the simple application is doing user authentication stuffs, it logs in and then get user’s profile and then show details on the web page.

I hit performance problem with buddy related operation – every user may have 0~20 buddies, I want to show each buddy’s last login time on the result page, and actually I’ve retrieve everything for those buddies. The most direct implementation as I did first, is using user object to get data back, obviously this is not good as for every user object, client needs to access Cassandra cluster to get data back, the TCP round trip would be a pain.

Then I added something to user object’s constructor, which load all buddies info in one shot (Cassandra’s multiget_slice API), things are getting better but this doesn’t seems reasonable to me as for most time, we don’t need buddy info (such as authentication), and getting buddies info back is just a waste of time.

So I added a new method to the user class, called load_buddies, this will load buddies info on-demand. This makes authentication pretty fast, but still keep the ability of loading buddies info in batch mode.

After all these the performance is … still not good, my test case is one login failure every ten requests, and for successfully logged user, I should buddy id and last access time, and also change the user’s last login time. The performance, with my current setting, the worst response time is about a second, while 90% request were done in less than 600ms.

There must be something can be tuned, though VM could be the reason of slowness. I will check following stuffs:

  • Apache HTTPd configuration, it seems prefork is performing better than worker, there may be more can be tuned include both HTTPd and wsgi
  • Python class optimization, I will review the implementation of user class, as I don’t want to make user class too complicated to be used
  • Cassandra performance, actually this is what I’m worrying about, as during the tests, Cassandra boxes’ CPU utilization is about 80% – 70% on user, 10% on sys, roughly, it could be the bottleneck

Without the buddy operation everything’s fine – the worst response time is about 600 ms while 90% requests are below 400ms. Relationship is a pain, it’s the bottleneck, but in this social era, there is no web application can live without relationship …

BTW, my testing environment:

  • Test client running on PowerBook, using ab – I will check if there is anything else can be useful
  • Servers are all running on same physical box controlled by proxmox, this includes a web server, a LVS director (to load balance Cassandra nodes), and 3 Cassandra nodes
  • The server box uses Ethernet, PowerBook is on wireless. I don’t think there is any issue for this as connect time is pretty low.

Reading python books

I just finished A Byte of Python, planning to read it again and then turn to Dive into Python, I’ve done some simple Python programs while I was reading, everything seems fine so far.

A little girl though A Byte of Python is a scary book, she thought I mis-spelled “bite”, so the book I read becomes “a bite of a big snake”, I know it’s scary :D . It took me sometime to got her understood Python is a computer programming language, I just wish I made it but anyway she stopped talking about the book and go back calling me “lobster killer”.

Yes, I cooked another lobster for the last day of Thanks Giving holiday.

I’m not that fast any more

“not fast” is better than “slow”, so I think I’m making progress, better than before.

Updates:

  • I moved to proxmox which gives me better VM performance so that I can have more VMs for my test, it did take me some time to dig out a usable solution. Now I’m running 4 VMs so I can test fail-over and bootstrap etc.
  • I moved to Python since PHP is not that popular now especially in all these new technology, I’m a code-by-sample guy, so while the whole world is writing codes in Java, Ruby, and Python, I don’t have many choices. I picked Python because I don’t want to run things like Tomcat, and built-in web servers does not convince me (I’m talking about Ruby).
  • I’ve done some simple tests but dealing with columns, etc., the test environment gives me reasonable performance number – 8ms per read/write.
  • I’m still learning Python and its web stuffs, seems not that hard to catch up though. I’m using web.py which seems to be the lightest framework, I may be wrong but I don’t want to dig in more at this moment.

To-do list:

  • I need to figure out if Ubuntu is still the way to go for my virtualization environment, I’m worrying proxmox is not a major player in this area so it may ruin my long term plan.
  • I need to find out if there is any other better HTTP server, “better” here means: light, support wsgi.
  • I’m going to compose some test scripts dealing with super column, which is what I need to use for the statistic project.
  • Revisit original design, both schema and work flow may have some changes.

I would like to say, everything is on the track, though I’m not that fast. I will post updates after this thanks giving as I doubt if I will have time coding during the holiday.

Late night … some progress

I’ve made some progress, I want to write it down here so that I can follow up tomorrow (actually, today):

  • Python is not that easy, but it is not that difficult, I’ve decided to use wsgi plus web.py to do my web development, I think wsgi is the right way to go, but web.py is still a question mark – I picked it up just because it is simple
  • Tuned Apache configuration to make it support wsgi/web.py, actually I was thinking of finding something else which will be lighter, I still need to do some more research on that but since I’m using wsgi so I don’t think changing web server will affect anything, other than deployment.
  • I found a place to host subversion, freely. Using version control can easily track changes, and remote repository will make sure my stuffs are safe. Free service does not guarantee 100% reliability, but thinking of I have local copy already, it’s acceptable

Changes …

Here is the deal – I decided to drop PHP and moving to Python, so that I can spend less time on dealing with less-supported PHP (in this nosql wave), I’ve removed PHP from all dev/test environments and wish I won’t come back later.

Actually I’ve made PHP works, but I’m just not feeling well as not many people are using PHP and it seems hard to seek help whenever needed. Also, seems setting up Python with Apache (through wsgi) is not that difficult. It could be a good chance to lear Python as well, though I did some PyS60 a while ago (for Jabber on E90).

BTW, I’ve upgraded all Fedora instances (3 of them) to Fedora 12, so far so good.

Update on pys60

I didn’t post anything here in the past couple of days as I was busy on pys60 stuffs, pretty fun and made progress.

Here are some issues I solved or partially solved (say, work around), some of them may look stupid to experience S60 or Python developers but thinking of I’m new to both …

  1. Access point selection, old version used to ask for selecting access point once it’s trying to establish connection to server. Some articles saying by import btsocket module as socket will solve the problem, but actually it does not work. The right solution is using the new feature from pys60 1.9.x (I believe this is the right version), that is, socket itself not support set_default_access_point, which is similar to btsocket’s method in same name, but taking name of the access point (the string) as the parameter which is actually more convenience than btsocket
  2. It seems loading time is really long (well, depends on how many modules to be loaded), so put something like a appuifw.note(“something”) at the very beginning, just after one import appuifw (remember don’t import all other modules) is much more user friendly
  3. combo in Form is way too hard to use, you have to have access to the combo value through form object, which is not convenience at all
  4. e32dbm … support string ONLY, and better encode/decode everything with utf-8, otherwise it will be all sort of problems to sync up the encoding.

There are some other minor findings such as different between list (it’s an array!!!) and a map. However, I’m still having problem in dealing with UI, at this moment I want to have a tab with two text boxes, one for showing the conservation (in and out messages), and the other one used to input message. I haven’t got any ideas how to make it, so far. It seems Canvas is the direction, but output text with line wrapping is way to hard to do to me.

I will post things here once I get any progress, but I would like to guess that won’t be significant in near future.

Making good progress with IM on E90

Hey, I’ve made good progress on E90’s IM (jabber client).

There are two threads that I’m currently working on, one is UI stuffs that I got everything from a book, the other one is XMPP protocol that I got information from xmpppy project’s sample (and yes, I’m using this project now as it seems to have minimum external dependencies). UI goes well, though I haven’t done anything in real yet, and I just made some good progress with XMPP protocol – I’ve been able to log into my test accounts and send/receive messages.

Now I need to speed up the UI progress as I need some basic UI so to make things working smoothly (such as now I have to shutdown my handset to quit the test program, which is SUPER ugly). I will check around and if there is nothing useful than that, I will register a new project on sourceforge, and wish this becomes my second product-level project (1st one is the mail alert but I’m no longer using it).

Python for Symbian

Now … Python time again, since it seems to be the easiest way to develope some leisure stuffs for my new E90.

So here are things to be installed:

  • ActivePython for Windows, at this moment need to use 2.5.x
  • ensymble for Windows, I’m using 0.28 (for Python 2.5)
  • openssl for Windows, remember download openssl.zip
  • pys60, current version is 1.9.7, this (the sis file) is for installing to my handset

Install pys60 to handset is pretty straight forward, except you may have to deal with certificate, etc. – check those Symbian forum please, I don’t think my solution is the best (allow to install any software and don’t check certificate at all).

Now, install ActivePython on Windows box by following instruction, then create a directory for ensymble (let’s say C:\Ensymble), unzip the openssl.zip to that directory (all 3 files in total), and then the ensymble_xxx.py – I changed the name to ensymble.py so that I need to type less whenever I want to use it. As the last step, put C:\Ensymble to your environment variable PATH.

After everything’s done, let’s try the first Python application for Symbian – pick an open source application from PyS60 Application Directory, and build it. I was using Magic Video as my first test, download the py file, put it somewhere and then run:

ensymble.py py2sis –uid=0×98765432 –appname=”MagicVideo” –caption=”Magic Video” –version=1.0.0 magicvideo.py magicvideo

you will get a .sis file in the current directory and that can be installed to handset.

Remember use uid >0×7fffffff, I didn’t pay attention to the warning message on the screen and this costed me almost half an hour to figure out why the application cannot be installed.

Back to PHP

It seems godaddy supports PHP only so I’m thinking of stop learning Python for now even it has better framework. There are some widely used PHP framework that I can take to deploy to godaddy, which is more convenience.

Currently I’m comparing Zend, CakePHP, and Symfony. I guess I won’t try Zend as it sound like too old, and does not support application generation. People mentioned CakePHP lack os real model support, while Symfony is sort of too complicated to start with.

I will focus on CakePHP and Symfony, I get a feeling that I will stick with CakePHP without any reason, but sure I will do serious research on Symfony as well.

Again, I will post my findings here.

django, lighttpd, …

I’m trying to study Django more, this needs to get rid of the limitation of running test locally.

I was doing port forwarding and then I thought, “anyway I will try deployment later on, why not just start the trial from now?”

So, I started looking around, of course first thing came to mind is mod_python with apache httpd, but my Ubuntu has lighttpd running already (I have no idea when and why I deployed it), and then I thought, “yea, apache httpd is way too heavy, let me try lighttpd”.

It turned out lighttpd is not perfect in administration though it may get me some performance gain. I have to launch fastcgi by myself, and in a real production environment it means I need to write some monitoring (parent) process to make sure the fastcgi server is running for all the time.

Anyway I make django running on lighttpd, there are some URL mess and I still haven’t figured everything out, it seems django+lighttpd prefer application-per-domain instead of application-per-URI, I will check more to see if it is the case or not.

WordPress Themes