Downes.ca ~ How RSS Can Succeed

How RSS Can Succeed

Feb 24, 2004
By Stephen Downes

Oops!

Correction - I typed 'post' instead of 'presentation' in the link. If you are here for the presentation, Toward a New Knowledge Society, Click here.

I posted this item in response to RSS: A Big Success In Danger of Failure, by Bill Burnham.

I'm not so sure I agree that RSS is as deeply rooted in push technology as this article suggests, but I am in overall agreement with the main thesis. In particular, I believe that neither search nor classification will address the complexities introduced by large numbers of feeds. But alternative approaches will address these shortfalls.

What we need to keep in mind is twofold: first, that for the most part most content of most feeds will be irrelevant to any given reader. And second, as is suggested, metadirectories organizing and categorizing feed content provide enough filtering for most people.

As evidence, I cite my Edu_RSS service, and in particular, Edu_RSS Topics. The output from thsi service is a set of highly focus, yet comprehensive, RSS feeds of interest to educational technologists.

Edu_RSS approaches the problem from two directions. First, it harvests from only a small subset of feeds - 300 or so out of the hundreds of thousands available. These feeds are representative - that is, since most of them are blogs, a collective gathering and filtering effort has already taken place. The actual list of sources numbers in the thousands, arguably the entire set of sources in the field.

After aggregating these feeds, Edu_RSS combines the content and organizes into a set of categories (or 'topics'). The topics are defined using Perl (and unix) regular expressions, a flexible filtering mechanism that allows the selection of numerous expressions within a single phrase. The use of regular expressions allows the service to identify string combinations characteristic of a given topic, and thus results in a well selected set of resources.

According to my website statistics, Edu_RSS is consistently one of the most popular URLs on my website, following only the two files that generate my referrer system (which is another story). The filtering system is very effective: if something significant is published on, say, learning objects, it will appear as one of the less than a half dozen daily items in the 'learning objects' feed.

The mistake made by the early advocates of push - and by a commentator just above - lies in the idea that 'brand' will replace intelligent filtering. Brand fails because in order for something to be a brand, it must appeal to a large mass of people. But if it appeals to a large mass of people, it will invariably disappoint people looking for something more specific. The early advocates of push tried to promote existing brands, and readers found in push nothing they couldn't find in mass media.

I have argued elswhere that the only way to aproach content location on the internet is to treat it as a self-organizing network. What this means is that inherent in the structure of the internet there are distinct layers of filtering mechanisms, each consisting of a "gather filter forward" mechanism. In some cases, the mechanism is fulfilled by a human agent, as in the case of blogs. In others, it is fulfilled by automatic mechanisms, such as Edu_RSS. And it is likely that Robin Good's newsmasters will in their own way also play the same role.

What's important here is that each node of each layer need not worry about the rest, and need not be focused on the goal of the system. The agent seeks what is available, the way a retinal cell gathers light, and passes on what is relevant, the way a neuron passes on a signal. The filtering occurs not in the individual node, but through the independent actions of the aggregation of nodes.

The reason why this system works, while other approaches do not, is that there is no reasonable mechanism which can apply the vast requirements of filtering on a single resource. If we use metadata, the indexing soon outweighs the content. If we use search engines, each resource must be subject to extensive analysis to determine context (or, we do without context, which results in a search for 'calf' linking to sites on agriculture an anatomy).

The layered mechanism works because at no point is the entire weight of the filtering process concentrated in a single individual or a single resource. Decisions about selection and classification are made on a case by case basis using very coarse, and unregulated, mechanisms. It means that individual agents can work without the need for central control, with the only requirement for a functional system being an open set of connections between the agents.

RSS is, today, the transport mechanism of choice. There is nothing magical about RSS, except for the fact that it just is an autonomous agent system providing a high degree of connectivity. As tye system matures, additional encoding systems, such as FOAF, say, or ODRL, will play their own important roles, offering different kinds of connections within the same network. The decisions make will become richer, without a corresponding increase in the complexity of the system.

So, RSS could succeed. It will probably succeed. But it is important to keep our focus on what it does well: it allows an individual to scan, filter, and pass forward. That's all it ever has to do. The network will do the rest.