Sep 182014

I’m not sure what I’m going to build yet, but it should be something that could make my ops life easier, I don’t think it will be automation tool, as the best solution for automation is command line (script).

Maybe something regarding to monitoring? Building up a dashboard will be interesting, though most people may use Web page instead.

Still thinking …

Aug 302013

As I mentioned – I’m migrating my sysadm tasks from Perl to Python, and Windows is just one of them.

This is a real case, I like VirtualBox as a neat visualization container, and spending lot of time on using it, actually there are always 2~3 VMs running on my laptop while another 4~5 lying on HDD. However, I have different requirements for different types of VMs, like for Windows VM, I used to launch them in GUI mode so that I can use another Windows environment, but for Linux and other *nix machines I used to launch them in headless mode and use putty (another neat tool ­čÖé ) to access. Continue reading »

Aug 222013

I’m working on a set of data refreshment scripts, which get data from file, do some transform, then send to a HTTP interface. Since the HTTP interface is kind of slow compare with reading and transforming data, I have several forked children processes to handle HTTP part.

Everything was done by Perl about 6 months ago, and everything seems good … until I started picking up Python. First sight is that Pyhon program is about 50% of Perl in term of LOC, which makes it easier to read, but seriously I don’t care about this too much as the logic is quite simple, however, when I tested Python programs and found that they are at least 50% faster than Perl’s, I felt nervous.

Two examples – Perl takes 13 seconds, Python takes 5, Perl takes 34 minutes, Python takes 10. Actually I’m really nervous at this moment thinking of my poor Python skills, I always worry if I made anything wrong with the translation (from Perl to Python), even I have verified result data for quite some times.

Will dig in after converting all scripts to Python.

Aug 022013

I think I’m going to play with Perl, PHP, and Python at almost the same time for some time, and thinking of the fact that dynamic graph is one of the most important feature for web pages, I think I need to find a cross-language solution.

I’m happy with PHPlot and decide not to touch GD::Graph, and I came across GDChart which seems to be good in cross-platform/cross-language area. I will give it a try and will post the result here.

Jul 192013

I was assigned to a Web project which is to present data analysis result to users. Original data came from Web log, plus some extra information, then headed to Hive, then populated statistics files after scientist’s analysis┬áThere are several interesting topics: geo graph, rendering another web page, and metrics graph, roughly speaking, I have no idea of any of these at the beginning of the project.

I think the best decision I made is to use GD based solution, actually that could be the only solution I can think about, I decided to use PHP for Web, plus Perl for batch processing, this actually seems not to be quite right as I’m migrating everything to PHP now as there is not much “real” batch processing, and everything could be done in shell. Also I decided to use server-side DOM model (read: PHP DOM) so not to slow down the project by my poor JS skills, actually my PHP skill is not that good but JS is definitely *poor*. Continue reading »

Jul 172013

Haven’t dig into legal issues yet, but for my hobby project I got everything regarding map from here:

After getting shp files from the web site, I use pyshp module:

To extract data to plain text format so that other programs can read it directly. There is an old version of pyshp comes with Ubuntu, but it’s sufficient to me.

Dec 082009

Testing a prototype that uses Cassandra as the back end storage, the simple application is doing user authentication stuffs, it logs in and then get user’s profile and then show details on the web page.

I hit performance problem with buddy related operation – every user may have 0~20 buddies, I want to show each buddy’s last login time on the result page, and actually I’ve retrieve everything for those buddies. The most direct implementation as I did first, is using user object to get data back, obviously this is not good as for every user object, client needs to access Cassandra cluster to get data back, the TCP round trip would be a pain.

Then I added something to user object’s constructor, which load all buddies info in one shot (Cassandra’s multiget_slice API), things are getting better but this doesn’t seems reasonable to me as for most time, we don’t need buddy info (such as authentication), and getting buddies info back is just a waste of time.

So I added a new method to the user class, called load_buddies, this will load buddies info on-demand. This makes authentication pretty fast, but still keep the ability of loading buddies info in batch mode.

After all these the performance is … still not good, my test case is one login failure every ten requests, and for successfully logged user, I should buddy id and last access time, and also change the user’s last login time. The performance, with my current setting, the worst response time is about a second, while 90% request were done in less than 600ms.

There must be something can be tuned, though VM could be the reason of slowness. I will check following stuffs:

  • Apache HTTPd configuration, it seems prefork is performing better than worker, there may be more can be tuned include both HTTPd and wsgi
  • Python class optimization, I will review the implementation of user class, as I don’t want to make user class too complicated to be used
  • Cassandra performance, actually this is what I’m worrying about, as during the tests, Cassandra boxes’ CPU utilization is about 80% – 70% on user, 10% on sys, roughly, it could be the bottleneck

Without the buddy operation everything’s fine – the worst response time is about 600 ms while 90% requests are below 400ms. Relationship is a pain, it’s the bottleneck, but in this social era, there is no web application can live without relationship …

BTW, my testing environment:

  • Test client running on PowerBook, using ab – I will check if there is anything else can be useful
  • Servers are all running on same physical box controlled by proxmox, this includes a web server, a LVS director (to load balance Cassandra nodes), and 3 Cassandra nodes
  • The server box uses Ethernet, PowerBook is on wireless. I don’t think there is any issue for this as connect time is pretty low.
Nov 302009

I just finished A Byte of Python, planning to read it again and then turn to Dive into Python, I’ve done some simple Python programs while I was reading, everything seems fine so far.

A little girl though A Byte of Python is a scary book, she thought I mis-spelled “bite”, so the book I read becomes “a bite of a big snake”, I know it’s scary :D. It took me sometime to got her understood Python is a computer programming language, I just wish I made it but anyway she stopped talking about the book and go back calling me “lobster killer”.

Yes, I cooked another lobster for the last day of Thanks Giving holiday.

Nov 242009

“not fast” is better than “slow”, so I think I’m making progress, better than before.


  • I moved to proxmox which gives me better VM performance so that I can have more VMs for my test, it did take me some time to dig out a usable solution. Now I’m running 4 VMs so I can test fail-over and bootstrap etc.
  • I moved to Python since PHP is not that popular now especially in all these new technology, I’m a code-by-sample guy, so while the whole world is writing codes in Java, Ruby, and Python, I don’t have many choices. I picked Python because I don’t want to run things like Tomcat, and built-in web servers does not convince me (I’m talking about Ruby).
  • I’ve done some simple tests but dealing with columns, etc., the test environment gives me reasonable performance number – 8ms per read/write.
  • I’m still learning Python and its web stuffs, seems not that hard to catch up though. I’m using which seems to be the lightest framework, I may be wrong but I don’t want to dig in more at this moment.

To-do list:

  • I need to figure out if Ubuntu is still the way to go for my virtualization environment, I’m worrying proxmox is not a major player in this area so it may ruin my long term plan.
  • I need to find out if there is any other better HTTP server, “better” here means: light, support wsgi.
  • I’m going to compose some test scripts dealing with super column, which is what I need to use for the statistic project.
  • Revisit original design, both schema and work flow may have some changes.

I would like to say, everything is on the track, though I’m not that fast. I will post updates after this thanks giving as I doubt if I will have time coding during the holiday.