Feb 282012
 

Just moved to new office and am still working on setup stuffs, most likely everything is done but need to be double confirmed.

It’s a good office, and before we remodaling the cubicle, I will be sitting in this nice cube with pretty good view :P.

Feb 272012
 

I was working on a simple feature to grab some innerText of HTML tags in a web page, and need to get that done in front end, i.e. JavaScript. The basic feature took me no more than 5 minutes to finish, but after I turned from Chrome to Firefox, I knew I’m in big trouble.

Browser compatibility, which seems still a nightmare in now a world even Firefox, Chrome, IE, and Safari are all announcing following some sorts of rules, and obviously learning those rules seems to be tough.

So I turned to jQuery (and actually tried YUI as well) and it seems pretty cool, everything works smoothly even on my mobile phone. Although the feature is not that fancy so it may haven’t hit the brick yet, but definitely this is a good sign to me.

A friend of mine told me that their dev team is moving from native Apps to HTML5 for quick prototype and to avoid troublesome incompatibility problem, is that what I should do?

Feb 212012
 

I setup a Solr Cloud but I’m not quite sure how it works … ๐Ÿ˜€

It seems the problem came from ZooKeeper – the first node which is running ZooKeeper needs to upload configuration files to ZK, so it takes sometime, however, in my rc script I cannot determined the right way to check if the configuration files are there or not in ZK (I do check ZK port, though), so every time if I start from scratch (means clean up everything) if won’t work.

Luckily starting from an empty box is not a common case, and even that’s the case, simple shutdown the cluster (4 solr nodes) and restart it will make everything works.

It seems in real production world, something need to be done to make all nodes coordinate properly, or, better run a standalone ZK cluster (ZooKeeper ensemble).

Feb 172012
 

Someone asked me why I don’t post topics about family here, I said, no, no family topics here.

The problem is that now a world there are way too many ways to leak personal information to the public, I don’t want to add on one more possibility to hurt my family, though potentially.

So, I wish I made it clear, and let’s go back to geek (or, nerd) world.

Feb 142012
 

The AWS App on iOS is pretty much done, at least I have made readonly part of S3 and EC2 works, and I don’t think write operations (create, update, delete, etc) are feasible to mobile phones.

I’m moving to Windows Phone now, trying to build up similar App in term of functionality – I’m not sure if I can build things with similar UI since different app store may have totally different tastes/rules on UI standard, better follow what they asked for.

I wish Windows Phone development will be easier thinking of I’ve done .Net/Visual Studio development before, let’s see – I give myself 2 weeks, and will spend no more than an hour per day.

Actually I tried both Android and Windows Phone the other day, trying to get basic idea of development. Obviously I don’t like Android, either because of I’m not a Java fan, or because of ugly interface of Eclipse – it may only be that ugly on OSX though, but I don’t have space (read: memory) to launch such a fat IDE in my Windows VM so leave it to the last.

Feb 102012
 

I believe I’ve set up majority parts of my terminology extraction demo site, there are still several parts were glued together instead of tightly integrated, but it works.

I think it’s the time to get into science part – need to dig out what kind of terminology should be returned whenever I got dozens or hundreds from a web page. Current algorithm is pretty simple (and may not make sense at all), just for testing purpose: sort by tf-idf, and title’s tf-idf has 3 times higher weight than content’s.

Anyway, don’t want to mention too many details here at least for now, I still need to get those glued parts done in a better way.

Demo’s here: http://solr.xiehang.com/, note that this may be taken down anytime without notice since I don’t want to leave such a easy-to-be-abused entry point on my servers.

Feb 082012
 

Just finished a prototype of terminology extraction based on nutch and solr, check test page.

I also have another (quick and dirty) script to inject new URLs into nutch and then solr, the whole demo is not finished yet since I need to put up something to remove outdated pages (what is outdated?).

The work flow should be something like this: Continue reading »

Feb 072012
 

I hit this problem in my project which is hadoop-based:

Cannot run program "chmod": java.io.IOException: error=12, Cannot allocate memory

Did some research but found nothing useful – everybody mentioned it’s JDK’s problem not using fork()+exec() which caused excessive memory allocated during spawning new process for running shell command. However, it’s weird that I hit this problem on my AWS micro instance only, not on my MacBook, so I moved on to check some more –

It turned out swap is a problem, my micro instance in AWS does not have swap enabled (i.e. zero swap space), after add 1G swap everything’s fine now.

I’m a Java newbie, so my question is, though it got solved, did I do something properly?