Jan 152014

Search team is doing crawling and some web sites are heavily using JavaScript to generate content. Whenever I said “heavily” I mean none of the UI elements was from HTML, instead, JavaScript runs after the page loaded, then shown to users.

I’m doing a prototype so that they can take as a reference and later on do something fit into their system better. The prototype as based on PhantomJS, it was in Ubuntu (12.04 LTS) repository which makes my life much easier. Again, I need to install xvfb so that I can run X-based application in command line. Continue reading »

Feb 062013

I’m trying my best to use as less EC2 hosts as possible, and seems I successfully made all of my stuffs running on … one micro instance 😀 .

Actually it surprised me a little thinking that I was running 2 large, 1 small, and 3 micro back to last September, obviously I spent too much on “playing” 🙁 .

Now I’m waiting all crawlers heading to the new machine, I found some crawlers are really … stupid? They didn’t move to the new machine even after the DNS had been update for more than 12 hours.