Nov 192009
 

I’ve made some progress, I want to write it down here so that I can follow up tomorrow (actually, today):

  • Python is not that easy, but it is not that difficult, I’ve decided to use wsgi plus web.py to do my web development, I think wsgi is the right way to go, but web.py is still a question mark – I picked it up just because it is simple
  • Tuned Apache configuration to make it support wsgi/web.py, actually I was thinking of finding something else which will be lighter, I still need to do some more research on that but since I’m using wsgi so I don’t think changing web server will affect anything, other than deployment.
  • I found a place to host subversion, freely. Using version control can easily track changes, and remote repository will make sure my stuffs are safe. Free service does not guarantee 100% reliability, but thinking of I have local copy already, it’s acceptable
Nov 182009
 

I used to run 4 VMs on my dual core machine but actually was running nothing seriousely. A while back I read an article from Ubuntu about its cloud setup, mentioning you should not run VMs more than number of cores you have, I was laughing at the article at that time, since it sounds ridiclous – as long as CPU can handle the load, we should run as many VMs as possible and – the bottleneck seems coming freom memory to me.

Obviously I was wrong. Once I started testing Casasandra, I got lots of timeouts from 4 VMs, I checked memory, there was no problem at all, i.e. every VM’s memory is in physical memory. I was also doubt Ubuntu is not doing as good as Fedore, obviously Iwas wrong either, then I recalled the article, shut down 2 VMs and … you know, things work like a chime.

There could be saomew thesis telling why CPU is the bottleneck, but I don’t have time to dig it out, the most important lesson I took from this case is, don’t take anythingn in granted, respect preofessional advices

Nov 172009
 

Here is the deal – I decided to drop PHP and moving to Python, so that I can spend less time on dealing with less-supported PHP (in this nosql wave), I’ve removed PHP from all dev/test environments and wish I won’t come back later.

Actually I’ve made PHP works, but I’m just not feeling well as not many people are using PHP and it seems hard to seek help whenever needed. Also, seems setting up Python with Apache (through wsgi) is not that difficult. It could be a good chance to lear Python as well, though I did some PyS60 a while ago (for Jabber on E90).

BTW, I’ve upgraded all Fedora instances (3 of them) to Fedora 12, so far so good.

Nov 132009
 

Finally I had to build my own instead doing package installation, and building Thrift is not that difficult (after you go through it once …).

You definitely need to read requirements for building Thrift, but things are not quite clear at the first glance, dependency list is not clear, so here are what I installed before “./bootstrap.sh; ./configure; make; sudo make install”, note that this is the package list for Fedora, but it should be similar to Ubuntu:

  • subversion
  • gcc-g++
  • java-1.6.0-openjdk-devel
  • perl-devel
  • python-devel
  • php-devel
  • mono-devel
  • boost-devel
  • libtool
  • bison
  • flex
  • perl-ExtUtils-MakeMaker

After all these installed, it will work like a chime.

Nov 132009
 

I’ve deployed Cassandra to my development environment, running on 4 servers with replica 2. I picked the number 4 and 2 because it’s more like a real world thing, and it is the requirement from my friend. I can test fail-over etc later on.

I’ve also composed some scripts to do service stuffs – the script I composed can start/stop/restart Cassandra gracefully, it can also tell status of the node and the cluster, it’s a simple Shell script, I will make it a service under Fedora, and a init.d script under Ubuntu (I’m running only these two platform now). Cassandra was upgrade from 0.4.1 to 0.4.2 days ago, and I used that as a chance to test my deployment stuffs, seems pretty good. I think I just need to be careful with 0.4.x => 0.5.x upgrade since it may break compatibility on configuration and command line.

I’ve converted my PowerBook to a dedicated client machine running Fedora … it’s a pretty old machine and seems Apple does not want to roll out new software (such as JDK 1.6) for it, so I did some survey around and picked Fedora (Ubuntu is not ppc friendly – the support is community based).

And finally I have Thrift/PHP up and running. At the very beginning I was thinking I should use Java as the client but later one found that I have no idea how to develop Java based web application, and since Thrift mentioned it supports all C/C++, Java, Perl, PHP, Ruby, Python pretty well, I should just pick my favorite language to do the test and then let real clients pick what theyย  want to use (where is my client, BTW? ๐Ÿ™ ).

And yes, I confirmed the schema (though Cassandra is schema-less thing), I’m going to test the schema with PHP client today. After that I will have to find a place to hold all my codes/configuration, etc, in a subversion, and based on what I found so far, github.com is the best candidate.

Will update here today or tomorrow.

Nov 092009
 

One of my friends asked me if this service is doable:

  • Every hour multiple machines (clients) will send ~1M actions records to the service
  • Each action contains: user id, action, action result, start time, end time, and couple of user profile keys
  • The service should be able to deliver reports for any given time frame (minutes), within 15 min after the period finishes. For example, report for 3pm~3:30pm should be available by 3:45pm
  • Reports include: how many users did a specific action with specific result (during that time frame), how many actions does a specific user did, and top number of actions taken by others users who did the same action as the user (hard to understand, but think about Amazon, “Customers bought this item also bought…)
  • Most important thing is: all these requirements should be done on no more than 4 machines, include redundancy, which means should be done on 2 or 3 machines

I have to say, this is a pretty common requirements for online services (shopping, search, gaming,… could be anything), and if I can make it and make the solution linear, then it will be pretty much interesting (let’s say, 40 machines supporting 10M actions per hour, a lot already).

I will do some research (the nosql stuffs could well fit into this one), and post thinking/design here.

Nov 072009
 

I made almost no progress with nosql stuff and the worse thing is I don’t know why, it seems I cannot concentrate due to this or that kind of problems.

However, I do spend time on reading all sort of documents, though I cannot tell the exact reason, I’m going to focus on following projects:

  • Cassandra, it impressed me on fail-over feature (built-in replica, auto bootstrap, etc)
  • CouchDB, there are way too many people mentioned its name, though feature-wise I don’t think it’s enough (built-in replication, but not replica)

There are two second tier project I will take a look at them once I have time:

  • MongoDB, it was said this is the one quite similar to CouchDB, but lack of some features
  • MemcacheDB, it is more like a caching engine with persistent storage, but it could be one of the most reliable key/value pair DB

Cassandra is done by Java, while CouchDB is down by erlang, I’m not 100% familiar with either, pity :(.

Nov 062009
 

Finally made some progress, I’m 301 now.

BTW, there were lots of people left in the past couple of weeks, include Rasmus Lerdorf.

 Posted by at 17:57  Tagged with:
Nov 012009
 

Some interesting about myย  blog:

  • I started blogging here since 2/10/2009
  • There are 238 blog entries already, about 0.9 entry per day, this seems good
  • 81 comment entries, about one comment every three posts, means less people read/care about it
  • 3,959 spams detected by Akismet, about 16 spams per post, is my blog so hot?
  • 1 category, 125 tags. I’ve given up the idea of setting up category, but as a return, I got too many tags – every two posts will create a new tag. I need to make this number lower by reviewing the tag list

Roughly speaking, I’m actively blogging, there are more spam engines care about my blog, even if there are lots of people reading my blog, there are less found it interesting so less people comment on it, I didn’t manage my blog well, or it spreads to too many areas so I have too many tags.

 Posted by at 14:36
Nov 012009
 

Halloween! So holiday season starts.

I went to shopping today and seems shopping season starts as well, I even saw one of the shopping mall setup a huge Christmas tree, it’s good to see people are happy, willing to spend money :P.

Wish everything goes better in the coming days.

 Posted by at 14:24  Tagged with: