Just finished a prototype of terminology extraction based on nutch and solr, check test page.
I also have another (quick and dirty) script to inject new URLs into nutch and then solr, the whole demo is not finished yet since I need to put up something to remove outdated pages (what is outdated?).
The work flow should be something like this: Continue reading »