Content-type: text/html Downes.ca ~ Stephen's Web ~ Bayes on RSS feeds - Unsuitable?

Stephen Downes

Knowledge, Learning, Community
Seb tossed this link to me and I feel like I ought to respond. It begins with the tantalizing idea of using Bayes Theorem using some Perl modules to autocategorize blog content. Nifty idea. Could it work? Well, not according to the critics. It does not take into account the origin of the feed, it does not take into account the placement of the word, and it does not take into account the relative importance of the word (such as placement in a title). One critic writes, "If the author of the feed has already denoted the news item was 'technology', it would be wise to give this match a probability of 1 for the category 'Technology'." Well, hardly. To assume that people will categorize entities correctly is the height of wishful thinking, in my opinion. To make the Baysean approach work, what designers should do is evaluate not mere strings, but couples. I would express it like this: title~RSS (which means, roughly, title contains the string 'RSS'). If these are the elements used in the Bayesean calculations then the objections vanish. Mind you, I have just quintupled the number of elements to be considered, so there are other issues to contend with. But all of that said, I'm not ready to go Baysean just yet. My preference is a type of pattern-detection using Perl regular expressions. CRLF

Today: 0 Total: 1370 [Direct link] [Share]

Image from the website


Stephen Downes Stephen Downes, Casselman, Canada
stephen@downes.ca

Copyright 2024
Last Updated: May 02, 2024 02:58 a.m.

Canadian Flag Creative Commons License.

Force:yes