Thursday, May 22, 2008, 9:14pm
Eureka! Science News just launched -- it is a site dedicated to provide the very latest science news, but with a special twist -- it is entirely automated! There is no human editor behind it - it finds relationships between news stories from all major science sites and regroups, categorizes, ranks, tags, finds related press releases and publishes them directly on the site. The result is an efficient overview of everything happening in science, right when it happens. The following details how we built the site.
...
I identified the principal components of an intelligent news aggregator:
- A source of news, such as an RSS aggregator
- A clustering engine, to group news together
- A classification engine, to categorize the news (Is this Biology, Physics, Medicine or Astronomy?)
- A way to assign scores to clusters, to determine in which order the news should be displayed
This is five thousand types of awesome. I can't tell you how annoying it is to read the same story on ten different sites in my RSS reader because it has no way of knowing.