Jul 192013

I was assigned to a Web project which is to present data analysis result to users. Original data came from Web log, plus some extra information, then headed to Hive, then populated statistics files after scientist’s analysis There are several interesting topics: geo graph, rendering another web page, and metrics graph, roughly speaking, I have no idea of any of these at the beginning of the project.

I think the best decision I made is to use GD based solution, actually that could be the only solution I can think about, I decided to use PHP for Web, plus Perl for batch processing, this actually seems not to be quite right as I’m migrating everything to PHP now as there is not much “real” batch processing, and everything could be done in shell. Also I decided to use server-side DOM model (read: PHP DOM) so not to slow down the project by my poor JS skills, actually my PHP skill is not that good but JS is definitely *poor*.

I started with looking for geo graph solution. It’s obvious that the most important thing is get data to draw a map, soon I found that everything on the Web is shapefile based which need another round of search to get a workable dump solution. I didn’t spend too much time on this … at the beginning I was using the open source Quantum GIS software, later on changed to PyShp which is much light weight – GIS software has lots of fancy features, but all that I want is read shapefile and dump coordinates.

Once I have the coordinates data, geo graph becomes really simple, all that I need to do is read in data, use GD polygon feature to draw the map. Sure this needs to hook up with data, but I believe this is suitable to a high-school kid as a science project, or maybe middle school? I’m not sure 😀 .

Re-rendering an existing Web page is definitely a tough labor work, you don’t have to be smart but you need to be very careful, analyzing page structures, decided where to put something in and where to remove something so to make the final page as you want. The source Web site (another subsidiary of our company) has some interesting anti-abuse tricks, I have to put proxy calls to almost every requests to their resources. Other than this I cannot remember anything that needs deeply research. However, after I finished almost all these, I believe the right solution for this is doing everything in a JS plus CSS instead of PHP, but I don’t want to change it since I don’t want to step too deep into JS stuffs.

The last thing is metric graphs. I started with Perl’s GD::Graph which is OK, at least I can get everything done but took me too much time. Later on I found PHPlot and switched to this pure PHP solution. PHPlot is much easy to understand and also easy to debug (this is not because of PHP/Perl difference), I think I’m going to stick with PHPlot and never go back to GD::Graph.

Some other thinking came to my mind while I was playing with all above. First, jQuery is something neat, makes my life much more easier and no wonder so many JS developers are using it and things similar, I’m going to dig more in especially try it out wiht PhoneGap, etc. to see if it can speed up mobile app development. Second, Python came to my view again, I think I will utilize it for my next sys admin task, and see how it can work with rest of my world, I’m more interested with its “glue” ability with C/C++ modules. Third, I got a weird idea of rewriting PHPLot in C/C++, for reason unknown :D, the performance is not a concern to me at this moment, but I just don’t like ~8000 lines of PHP codes in one file staying on my web server, but I believe this will be just an idea at least for near future.

Perl needs to improve its ability to handle UTF8 characters, seriously.

  One Response to “How to solve problems”

  1. Forgot to mention wkhtmltopdf which I used to generate snapshot of the web site, it’s a neat tool and almost led me to dig into WebKit 😉 .

Sorry, the comment form is closed at this time.