Dec 03, 2002
RSS stands for 'Rich Site Summary' and is a type of XML document used to share news headlines and other types of web content. Originally designed by Netscape to create content 'channels' for its My Netscape pages, RSS has been adopted by news syndication services, weblogs, and other online information services.
Because it is one of the simplest uses of XML, RSS has become widely distributed. Content developers use RSS to create an XML description of their web site. The RSS file can include a logo, a site link, an input box, and multiple news items. Each news item consists of a URL, a title, and a summary.
Content developers make their RSS files available by placing them on their web server. In this way, RSS 'aggregators' are able to read the RSS files and therefore to collect data about the website. These aggregators place the site information into a larger database and use this database to allow for structured searches of a large number of content providers.
Because the data is in XML, and not a display language like HTML, RSS information can be flowed into a large number of devices. In addition to being used to create news summary web pages, RSS can be fed into stand-alone news browsers or headline viewers, PDAs, cell phones, email ticklers and even voice updates.
The strength of RSS is its simplicity. It is exceptionally easy to syndicate website content using RSS. It is also very easy to use RSS headline feeds, either by viewing a news summary web page or by downloading one of many free headline viewers. Though most RSS feeds list web based resources, several feeds link to audio files, video files and other multimedia.
Why RSS is Important for Educational Designers
RSS is the first working example of an XML data network. As such, and in this world of learning objects and metadata files, RSS is the first working example of what such a network will look like for educational designers. Just as news resources are indexed and distributed in the RSS network, so also educational resources can be indexed and distributed in a similar learning object network.
The model provided by RSS is very different from the model provided today by learning content management systems (LCMSs). In the world of the LCMS, everything is contained in one very large software application. Insofar as content is distributed at all, it is distributed in bundled content libraries. This means that educational institutions must make a major investment in software and expertise in order to access learning content.
RSS, by contrast, is not centralized. It is distributed. Content is not distributed in bundles, it is distributed one item at a time. There is no central store, repository or library of RSS content; it is all over the internet. To access and use RSS content in a viewer or in a web page, you do not need a large software application. A simple RSS reader will do the trick.
For this reason, the distribution of educational content over the internet will look a lot more like an RSS network than it will an enterprise content management system. Many more people will use a distributed learning object network not only because it?s easier and cheaper, but because they can access much more content for much less money.
As a result, the concept of syndicated educational content can really come into play. While there will always be a need for reusable learning objects (RLOs), anything that can have an educational application - including images, videos, journal articles, even news items - can be distributed through a learning object syndication network.
The RSS Network Architecture
An RSS network consists of three major components:
1. A (large) number of content providers, each providing news articles, and each providing their own RSS files describing these news articles.
2. A (smaller) number of RSS aggregators that read these RSS files from multiple sources, collect them into an index, and provide customized 'feeds' of topic-specific news headlines from this index.
3. A (large) number of news viewing applications that, based on user input, connect to an RSS aggregator, access a news feed, and display it to the reader. On viewing the news feed, the reader can then select a news item (by clicking on the headline) and read the article directly from the content provider.
The RSS network architecture looks like this:
A single RSS file is typically called an RSS channel. This is a lot like a television channel or a radio channel: it contains news items from a single source. For example, to the right is an HTML view of an RSS channel from the online magazine First Monday.
An RSS channel consists of two major sets of elements:
Channel Properties - the name of the channel (in this case, First Monday), a home URL for the channel, and an image for the channel.
Item Properties - the separate news items listed in the channel. In this case, there are ten news items listed. Each item has a headline and a URL. In some cases, an item will also contain a short summary, a publication date, author information, and more.
In order to define a channel like the one on the right, the channel properties and the item properties are defined in an XML file (or to be more precise, an RSS file), as follows:
At the top of the text box is a declaration of the type of XML file being used.
Next we see an XML field describing the RSS channel. Within this field is the channel name, link and description.
Finally, we see a list of the items available in the channel (I have only listed two items here). Each item is described with a title, and a URL.
Creating an RSS Channel
Because an RSS channel is an XML file, it can be created using a plain text editor - the same sort of editor that you might use to create an HTML page. It is usually easier to start with a template (such as the RSS file displayed on the previous page) and to insert your own values for each tag.
Typically, though, RSS files are created automatically. This is possible because an RSS file has a standard format. Thus, if you have a database of articles, then you can easily create an RSS channel from that database by extracting table data into XML data.
Another popular means of creating an RSS file is by means of scraping an HTML file. To scrape an HTML file is to extract link titles and URLs from an ordinary web page. This is done by analyzing the HTML tags and for the link title and URL. A script such as this in Perl
will generate a list of the URLs and titles in almost any HTML page. Thus it is very easy to write a script that will generate an RSS file from any web page.
There are online services, such as Moreover, that specialize in HTML scraping. Moreover scans the web pages of major newspapers from around the world and generates RSS channels for them. Moreover also provides a series of specialized RSS feeds.
Weblogs, or as they are sometimes called, blogs, have a unique role in the world of RSS. A weblog is, in the first instance, a web page that is updated on a regular basis. Thus a weblog resembles a diary or a journal; entries are dated and each day the weblog web page contains something new.
What distinguishes a weblog from a personal web page, though, is that the weblog consists of a series of entries associated with links to other resources on the web. Thus the typical weblog consists of a list of sites, descriptions of those sites, and some discussion.
My daily newsletter, OLDaily, pictured at right, is a typical example of a weblog.
OLDaily has channel elements, such as the newsletter title and home page URL.
The difference is in the items. I am not listing my own articles. I am listing articles published by someone else. The description, however, is mine. I am providing my description and interpretation of someone else?s material.
Also worth noting is that I did not obtain my items from a single source. As you can see by looking at the items, I have listed different articles by different authors working for different publications.
So a channel need not be produced by a content producer. A channel can be created by anybody with something to say about the items being described.
The RSS for OLDaily, though, looks exactly like the RSS created for First Monday. If you were to look at the RSS for OLDaily, though, you would find several more tags, and specifically, tags to denote the author, publisher and publication date of the article, along with the URL and the title.
An RSS aggregator is a type of software that periodically reads sets of RSS files and indexes them for display or syndication. There are two major types of aggregator: centralized and personal.
A centralized aggregator is intended for use by a number of people. RSS files are read by the centralized aggregator and are then used to create a topic-specific web page or customized RSS feeds (as in the diagram above).
The Moreover aggregator, for example, culls RSS from a variety of sources (including HTML pages, which it scrapes). It then provides RSS feeds devoted to specific topics ? such as Microsoft, as illustrated - that can be used on web pages.
Another well known centralized aggregator is a web site called News Is Free. At latest report, the aggregator collects headlines from 3744 sources and allows readers to browse the headlines or to search for the latest news. The site also offers headline syndication and web services.
A personal aggregator is an application that runs on a user?s desktop. It can access a centralized aggregator (in which case it functions as a headline viewer) or, more frequently, it can access an RSS channel directly. This is called subscribing to the RSS channel.
Radio Userland, for example, accesses a list of channels from a centralized aggregator. The user selects a set of channels from this list and subscribes to them. Radio then updates item listings from the selected channels once an hour. Using the data supplied from the RSS files, it also facilitates the creation of a personalized weblog (which can in turn be published as another RSS channel).
Another popular personal aggregator is called Amphetadesk. Like Radio Userland, users can select from a list of channels supplied by a centralized aggregator. Amphetadesk users can also subscribe to a channel directly if the channel provider has provided a specialized subscription script.
Aaron Swartz has written a novel aggregator that converts RSS channels into email messages.
Metadata Harvesting Generally
RSS aggregators are members of a wider class of software called harvesters. The purpose of a harvester is to retrieve and parse metadata located on remote servers, providing the information in a usable form for various applications.
In educational design, metadata harvesting involves the aggregation of metadata records associated with education and training resources. Aggregation provides greater exposure of those resources to the wider community. Aggregation also promotes the reuse of resources and encourages the development of interoperable resources.
The most well known metadata harvesting initiative in the education community is called the Open Archives Initiative (OAI). The purpose of the open archives initiative is to provide access to academic papers and other resources over the internet.
In the terminology employed by OAI, the content provider is called the data provider. The aggregator is called the service provider. And a harvester is a client application used by the service provider in order to access metadata from the data provider.
A number of initiatives have emerged based on the OAI harvesting protocols. The Public Knowledge Project, for example, is an initiative based at the University of British Columbia intended to develop an Open Journal System (OJS). The OJS assists with every stage of the refereed publishing process, from submissions through to online publication and indexing.
Another project is the Illinois OAI Protocol Metadata Harvesting Project. The public face of this project resembles a centralized aggregator in that it provides a search window for academic articles. It then displays the full metadata record for the selected article.
The National SMETE Distributed Library (NSDL) is another organization looking at metadata harvesting. The model described by NSDL mirrors almost exactly the model used by the RSS community. The NSDL project is attempting to collect metadata not only from OAI compliant archives but also from a wider variety of metadata sources. This, reports the NSDL, does not cause a problem not in the collection process but does cause a problem in service delivery.
The purpose of a headline viewer is to provide a list of headlines obtained from an aggregator. When a user selects from this list of options (by clicking on a headline), the headline viewer retrieves the article from the source site and displays it for reading.
Many headline viewers exist. One of the earliest and best known is Carmen?s Headline Viewer. This program runs as a stand-alone application on the desktop and taps into major centralized repositories such as My Userland, XMLTree, News Is Free and Grok Soup.
The major difference between a headline viewer and a personal aggregator (described above) is in the display of the results. Carmen?s Headline Viewer, as can be seen from the screen shot at right, displays headlines sorted by topic. Thus the reader is not restricted to a small set of channel subscriptions; instead, they obtain topic-specific feeds.
Other headline viewers, such as Novobot, create web pages for viewing headlines. This has the advantage of being a familiar interface for most users. However, web pages listing many resources can take a while to load.
What RSS Does Not Provide
RSS is a powerful tool for content syndication. However, it lacks some important features needed to develop into a robust solution for the location and organization of educational content.
One of the major deficiencies (identified in the NSDL paper) is the lack of digital rights management for both original articles and metadata feeds. RSS assumes that all articles and metadata are published on the open web and are therefore freely available for viewing by anyone. This means that resources for which there is a fee cannot be accessed through the RSS network.
Another major problem, again identified in the NDSL report, is RSS?s inability to deal with mixed metadata. Over the years various types of RSS have developed (RSS 0.9, RSS 1.0, RSS 2.0) and the tools have adapted on a case by case basis. RSS aggregators, however, still cannot access alternative forms of metadata, much less display resources from a wide array of both news and non-news sources.
A third problem for RSS is in the manner it handles weblogs. As described above, weblogs are commentaries on original resources. Yet they are displayed in the same format, and in the same place, as original articles. This can result in duplicate listings when the same resource is described in several weblogs. In addition, there is no means to display the comments from many weblogs side by side.
Arms, William Y. et.al. October, 2002. A Case Study in MetadataHarvesting: the NSDL. http://www.cs.cornell.edu/lagoze/papers/Arms-et-al-LibraryHiTech.pdf
Carmen?s Headline Viewer. http://www.headlineviewer.com/
Education Network Commonwealth of Australia. 2002. About Metadata Harvesting. http://www.edna.edu.au/harvesting/module2.html
Illinois OAI Protocol Metadata Harvesting Project. http://oai.grainger.uiuc.edu/search
Jackson, Dean. September 19, 2002. Aaron Swartz?s RSS to Email Aggregator. http://www.w3.org/2002/09/rss2email/
News Is Free. http://www.newsisfree.com
Public Knowledge Project. University of British Columbia. http://www.pkp.ubc.ca/ojs/
Radio Userland. http://radio.userland.com/
Van de Sompel, Herbert, and Lagoze, Carl . June 14, 2002. The Open Archives Initiative Protocol for Metadata Harvesting. Version 2.0. Open Archives Initiative. http://www.openarchives.org/OAI/openarchivesprotocol.html
Winer, Dave. October 8, 2002. What is a News Aggregator. http://davenet.userland.com/2002/10/08/whatIsANewsAggregator