Feb 102012
 

I believe I’ve set up majority parts of my terminology extraction demo site, there are still several parts were glued together instead of tightly integrated, but it works.

I think it’s the time to get into science part – need to dig out what kind of terminology should be returned whenever I got dozens or hundreds from a web page. Current algorithm is pretty simple (and may not make sense at all), just for testing purpose: sort by tf-idf, and title’s tf-idf has 3 times higher weight than content’s.

Anyway, don’t want to mention too many details here at least for now, I still need to get those glued parts done in a better way.

Demo’s here: http://solr.xiehang.com/, note that this may be taken down anytime without notice since I don’t want to leave such a easy-to-be-abused entry point on my servers.

  One Response to “Science stage now”

  1. I disabled the commonly accessible web pages so that I can keep my AWS bill within budget, hehe.

    Will move everything back to office later on (next Monday?), this needs to be fast as I can read wekkly report has already included this part.

Sorry, the comment form is closed at this time.