{"id":1256,"date":"2012-02-08T19:01:20","date_gmt":"2012-02-09T02:01:20","guid":{"rendered":"http:\/\/xiehang.com\/blog\/?p=1256"},"modified":"2014-01-28T11:10:50","modified_gmt":"2014-01-28T18:10:50","slug":"nutch-solr-term-vector-and-so-on","status":"publish","type":"post","link":"https:\/\/xiehang.com\/blog\/2012\/02\/08\/nutch-solr-term-vector-and-so-on\/","title":{"rendered":"nutch, solr, term vector, and so on"},"content":{"rendered":"
Just finished a prototype of terminology extraction based on nutch<\/a> and solr<\/a>, check test page<\/a>.<\/p>\n I also have another (quick and dirty) script to inject new URLs into nutch and then solr, the whole demo is not finished yet since I need to put up something to remove outdated pages (what is outdated?).<\/p>\n The work flow should be something like this:<\/p>\n Some improvements in my mind include:<\/p>\n I guess I will still need a hadoop cluster, even it is small, at least I can handle part of my SLA problem to them >:) . <\/p>\n","protected":false},"excerpt":{"rendered":" Just finished a prototype of terminology extraction based on nutch and solr, check test page. I also have another (quick and dirty) script to inject new URLs into nutch and then solr, the whole demo is not finished yet since I need to put up something to remove outdated pages (what is outdated?). The work […]<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[230,286,285,284,287],"_links":{"self":[{"href":"https:\/\/xiehang.com\/blog\/wp-json\/wp\/v2\/posts\/1256"}],"collection":[{"href":"https:\/\/xiehang.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/xiehang.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/xiehang.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/xiehang.com\/blog\/wp-json\/wp\/v2\/comments?post=1256"}],"version-history":[{"count":4,"href":"https:\/\/xiehang.com\/blog\/wp-json\/wp\/v2\/posts\/1256\/revisions"}],"predecessor-version":[{"id":1661,"href":"https:\/\/xiehang.com\/blog\/wp-json\/wp\/v2\/posts\/1256\/revisions\/1661"}],"wp:attachment":[{"href":"https:\/\/xiehang.com\/blog\/wp-json\/wp\/v2\/media?parent=1256"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/xiehang.com\/blog\/wp-json\/wp\/v2\/categories?post=1256"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/xiehang.com\/blog\/wp-json\/wp\/v2\/tags?post=1256"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}\n
\n
\n