UK Web Focus (Brian Kelly)

Innovation and best practices for the Web

Is It Too Late To Exploit RSS In Repositories?

Posted by Brian Kelly on 22 Dec 2010

A few years ago we had discussions about ways in which information about UKOLN peer-reviewed papers could be more effectively presented. We asked “Could we provide a timeline view? Or how about a Wordle display which illustrates the variety of subject areas researchers at UKOLN are engaged in?” The answer was yes we could, but it wouldn’t be sensible to carry out development work ourselves. Rather we should ensure that our publications were made available in Opus, the University of Bath’s institutional repository.  And since repositories are based on open standards we would be able to reuse the metadata about our publications in various ways.

We now have a UKOLN entry in Opus and there’s also an RSS feed for the items. And similarly we can see entries for individuals, such as myself, and have an RSS feed for individual authors.

Unfortunately the RSS feed is limited to the last ten deposited items rather than returning the 223 UKOLN items for UKOLN or 45 items belonging to me. The RSS feed is failing to live up to its expectations and isn’t much use :-(

The Leicester Research Archive (LRA), in contrast, does seem to provide comprehensive set of data available as RSS. So, for example, if I go to the Department of Computer Science’s page in the repository there is, at the bottom right of the page (though, sadly, not available as an auto-discoverable link) an RSS feed – and this includes all 50 items.

Sadly when I tried to process this feed, in Wordle, Dipity and Yahoo! Pipes, I had no joy, with the feed being rejected by all three applications. I did wonder if the feed might be invalid, but the W3C RSS validator and the RSS Advisory Board’s RSS Validator only gave warnings. These warning might indicate the problem, as the RSS feed did contain XML elements, such as which might not be expected in an RSS feed.

But whilst my experiment to demonstrate how widely available applications which process RSS feeds could possibly be used to enrich the outputs from an institutional repository  has been unsuccessful to date, I still feel that we should be encouraging developers of institutional repository software to allow full RSS feeds to be processed by popular services which consume RSS.

I have heard arguments that providing full RSS feeds might cause performance problems – but is that necessarily the case? I’ve also heard it suggested that we should be using ‘proper’ repository standards, meaning OAI-PMP – but as Nick Sheppard has recently pointed out on the  UKCORR blog:

I have for some time been a little nonplussed by our collective, continued obsession with the woefully under-used OAI-PMH. Other than OAIster (an international service), the only services I’m currently aware of in the UK are the former Intute demo now maintained by Mimas.

In his post Nick goes on to ask “Perhaps OAI-PMH has had it’s day“.  It’s unfortunate, I feel, that RSS does not seem to have been given the opportunity to see how it can be used to provide value-added services to institutional repositories.  Is it too late?

8 Responses to “Is It Too Late To Exploit RSS In Repositories?”

  1. Ben Toth said

    I hope not. In general terms RSS has yet to be exploited as fully as it might have been. Why I’m not sure. Perhaps it is has suffered from being too complicated for some information professionals and not complicated enough for others.

  2. […] This post was mentioned on Twitter by AJCann and Miquel Duran, Brian Kelly. Brian Kelly said: Is It Too Late To Exploit RSS In Repositories?: A few years ago we had discussions about ways in which informati… http://bit.ly/hMvGPB […]

  3. Tony Hirst said

    I used to advocate the adoption of RSS a lot, and came across some of the problems you mention repeatedly, such as the inability to consume certain pages in off-the-shelf feed consuming apps.

    Many of the problems resulted from non-standard character encodings, or incorrectly encoded item.description text. Links/URLs were occasionally missing or pointless (e.g. pointing to the root domain from which the feed was served, rather than anything relating to the particular feed item). Generating sensible URLs for feed items could also turn up issues with the way pages were served, eg on sites where session variables or other arbitrary keys were required.

    The reason the problems were allowed to slip through was because of the context in which the feeds were published. Eg request goes in for ‘we need a feed’; developer adds feed, runs it through validator, job done.

    But the job isn’t done, just as the job isn’t done when a someone publishes a public/open data set but doesn’t do anything more than that, or someone publishes an OER and considers that now it’s public, it’s useful.

    I spend way too much of my time trying to glue things together, and finding more often than not that they don’t play nice. For example, Guardian datastore data often falls just short of being easily combined with other data sets, even other Guardian datastore published datasets, though this is getting better all the time as workflows are tweaked ever so slightly…

    One possible solution, where things are published /with the intentions that others re* them/ is for the publisher to demonstrate a simple remix or combination with at least one other information source.

    If you publish an RSS feed, demonstrate one or two off-the-shelf ways of consuming it. This is what any user is likely to try first, so save them the grief of finding out it doesnlt work by making sure it does.

    When releasing data, if you’re publishing data relating to countries, for example, see if you can use one of the many services for generating map mashups to map the data. IF you can’t, what is it in or missing from your data that’s making it hard to do.

    If you’re publishing an OER, big or little, /how/ might you see it being remixed/reused with other OERs. If your content includes lots of diagrams, how easy is it for someone else to reuse that image (with attribution and in compliance with any other license requirements) in their own presentation. If they want to embed it in a blog post (generating not only more views of the content, but also trackable data that you can measure) just try giving a few examples of embedded use. If it’s hard for use as publisher to do the baby steps, why should anyone else bother? (Saying you’re publishing something because you don’t know how other people will use it is not the issue… if it’s hard to do the easy stuff, very few people will bother. The publisher needs to demonstrate the easy stuff, and see it as a way of getting a couple of pragmatic tests implemented as well as a quick tutorial in getting started with re*ing the warez.

  4. Nick said

    Hi Brian

    My recently reawakened interest in OAI-PMH was partly as a result of ukoer work with @xpert_project which began harvesting RSS feeds for ukoer (as did Jorum) during phase 1 of the programme. Idiosyncrasies of our repository platform (intraLibrary) meant that I couldn’t actually put out an RSS feed in a suitable format and @patlockley was good enough to work with me to also harvest OAI-PMH. Subsequently we’ve been working on another project to cross-search multiple repositories using aggregated data from Xpert (OER rather than research – see http://acerep.wordpress.com/) and Pat and I discussed the possibility/usefulness of Xpert-like service also harvesting OA research repositories (hence my post on http://ukcorr.blogspot.com/

    It’s interesting, I think, that OAI-PMH has arguably been more usefully employed for OER (Globe, OER Commons, Ariadne) than OA research, probably because there are less restrictions on the actual digital resources in that context than OA research and just aggregating bib records is of limited usefulness – which is why I’m so impressed by http://rian.ie/ in Ireland which returns full text only…

    The beauty of RSS, of course, is its simplicity to get stuff both in and out (of a repository) but OAI-PMH is more sophisticated especially if someone like Pat is then able to develop some clever APIs to do stuff with the data like http://www.nottingham.ac.uk/xpert/labs/

    Nick

  5. […] Originally posted as a comment on Brian Kelly’s Is It Too Late To Exploit RSS In Repositories?: I used to advocate the adoption of RSS a lot, and came across some of the problems you mention repeatedly, such as the inability to consume certain pages in off-the-shelf feed consuming apps. […]

  6. Maybe we’ll only get properly functioning RSS support in repositories once we finally put a stake through OAI-PMH’s dead, cold heart and put an end to any remaining hopes for revivifying it into some sort of living interoperability solution.

  7. Pat said

    https://lra.le.ac.uk/handle/2381/316 – the repository – count up the items – 18 + 2 + 20 + 17 + 77 + 8 – that doesn’t add up to 50.

    It just happens that 50 is more than 10.

    Leicester’s feed seems to be a workaround, and it’s still XML, bar the warnings. The major issue in the feed is the thin nature of the metadata – link, title, desc, date – No keywords, authors, related items? Bit pointless really as a data set.

    As for OAI-PMH vs RSS, that’s a bit like to the two tribes in Gulliver’s travels going to war over which hand to cut an egg with. They are almost the same thing, changing between the two of them is easy peasy. It took me about an hour to change Xpert from a pure RSS harvester into a combined RSS/OAI-PMH harvester. So I am not sure how one beats the other in terms of interoperability? OAI feeds tend to have a lot more data in, and have more items as well – they are often better data sets than RSS.

    Most RSS feeds are crap, pretty much for the same reason as Tony mentions. Monkey manager wants RSS, developer makes RSS, Monkey manager happy. No one uses RSS, RSS feed goes wrong, Monkey Manager never finds out, Monkey manager still happy.

    So the reason for shoddiness is, to be honest, no one cares about RSS. There aren’t really any people using it in anger (and if an off the shelf consumer doesn’t like it, it’s not like, y’know, hard, to like, write an XML parser).

    Most repository managers aren’t techie enough to solve the RSS feed problem, and, given there is no demand for it, why should they? It’s not too late for RSS, because I don’t think anyone really wants it that much, so it never really got going, so it can’t really be late for something that’s not happening.

  8. […] Despite a number of third party services having withdrawn support for RSS I am still convinced of the benefits of RSS.  Those who make use of WordPress software either as a blogging platform or as a CMS will be able to exploit the feeds provided by the platform and many other services still provide RSS.  The most significant gap in the services I make use of, however, is ePrints which drives our institutional repository service.  Sadly ePrints support for RSS is very limited and so I am forced to maintain my RSS feed for my publications separately :-(  It would be great if ePrints were to support the interoperably provided in a Web 2.0 world by RSS and not just the much smaller Library world based around OAI-PMH.  But, as I asked last year: Is It Too Late To Exploit RSS In Repositories? […]

Leave a comment