Science stage now

Feb 102012

I believe I’ve set up majority parts of my terminology extraction demo site, there are still several parts were glued together instead of tightly integrated, but it works.

I think it’s the time to get into science part – need to dig out what kind of terminology should be returned whenever I got dozens or hundreds from a web page. Current algorithm is pretty simple (and may not make sense at all), just for testing purpose: sort by tf-idf, and title’s tf-idf has 3 times higher weight than content’s.

Anyway, don’t want to mention too many details here at least for now, I still need to get those glued parts done in a better way.

Demo’s here: http://solr.xiehang.com/, note that this may be taken down anytime without notice since I don’t want to leave such a easy-to-be-abused entry point on my servers.

One Response to “Science stage now”

Hang says:

2012-02-12 at 00:00

I disabled the commonly accessible web pages so that I can keep my AWS bill within budget, hehe.

Will move everything back to office later on (next Monday?), this needs to be fast as I can read wekkly report has already included this part.

Sorry, the comment form is closed at this time.

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29