Downes.ca ~ Stephen's Web ~ CORDRA: ADL's Federated Content Repository Model

CORDRA: ADL's Federated Content Repository Model

Sept 02, 2004
By Stephen Downes

Summary of the ADL Content Object Repository, Discovery and Registration (or Resolution) Architecture, to be demonstrated later this fall and launched early in the new year. The idea is to create a system whereby all learning resources can be given a name and a system where these names can be resolved into physical addresses on the network. Not included in this paper (because I was talking at the time) was the exchange I had with the presenter, Dan Rehak, about the management of the system, the question of whether it breaks the internet into pieces, whether it builds authentication into the network infrastructure, whether the use of handles is the best way to locate objects, and whether the proposed system is or is not the same as RDF. These are all serious issues (in my view, at least), and while Rehak says this is a work in progress, it is also true that it will be dropped on the community as an essential fait accompli early in the new year. I will have more on all this some time in the future.

The project is about repositories. The first thing they did is thing of an acronym. Then they decided what it means (I'm the one responsible for changing th C in SCORM from 'course' to 'content').

SCORM specifies how to develop and deploy content objects that can be shared and contextualized to meet the needs of learners. ADL's mission was to try to save money and reuse content by working on interoperability standards. It's a collection of standards and specifictions that says how to use these objects together to do some things. It talks about how to develop, share, contextualize, create metadata for, and especially how you create structeured learning experiences. SCORM says nothing about how to find stuff and how to get it even if you know that it is there.

ADL's motivation: Department of Defense Instruction (DoDI) 1322.20 (sheesh).

Now there is this big issue, before you buy content, you have to make sure it doesn't exist, which means you have to find it, so there has to be a repository solution. The content has to be in SCORM, it has to be in an accessible searchable repository, and you have to actually use that repository.

What CORDRA aims to do is make published content available, persistent, discoverable, exchangable, let it be managed, so different people can have different rights, and so that you can tailor this. You have to be able to build systems that fit their model, not force them to use your model.

We started out by saying, what are the requirements for repositories? What are the needs? Business rules? There will be one you know about, and one you don't know about, for security reasons. Even within the one you know about, there may be different access. Each organization will have their own different rules, and we need to write a system that combines these.

We don't want to build new technology, we don't want to build new standards. It's painful. We want to find applicable standards. SCORM does that as a reference model. CORDRA will be like that.

We need to understand how CORDRA works together into an overall framework of things. Diagram from the conference in Australia last month (was in OLDaily). What's the general structure for leaning technology?

How do we make this thing scale, scale technically in terms of infrastructure, in terms of size, so on. I heard people talk about 20 courses, a hundred courses. Our pilot is a thousand courses. When I talk about scale, I talk about a million learning objects, a billion identifiers, a million users.

The plan is to start with tech that we think works and is scalable. We built a prototype, that works (last week, it didn't). Because of the timing, a production version is planned for January, a public demo in December. You will see 'ADL Repository' - you will not se the word CORDRA - CORDRA is the generic erm, versus the ADL Repository, the instance of CORDRA within the US government.

Assumptions: content for reuse, content in context. How do we do for learning objects, what is the equivalent to pagerank? We want the top ranked one not just based on keywords, but for your specific content. Assumption: we want precise results. Assumption: metadata is hard, most metadata pools are not very good (to be kind about them). In the big scheme of things, if people are paying for content, librarians are not expensive (so a metadata based system ahs a built-in bias for commercial content - SD) Assumption: flexibility and scale. AssumptionL support local policies - allow each organization to define their own metadata, their own IPR rights, etc. Assumption: lots of unknowns. Design is based on (small-s) services. We have no idea what all the services and components are, but we know we don't know.

[Diagram of CORDRA] We have content, we have users, and in the middle we have common services, code that we don't want to have to right all the time. The two main pieces: an identifier system, a way to put a permanent identifier on everything. Even metadata - if a metadata tag is called 'general', 'general' will have a unique identifier. And second, what CORDRA does, is a catalogue of all content, a registry of all repositories, and a registry of all the studd that describes everything, all the schemas, all the taxonomies. The idea is, if you have you have one unique identifier, and one password, you can find anything in the world.

CORDRA operations: register a content object, search the content catalogue, register a content repository, query the repository.

Create the content/. You have an application thatcreates content, a SCORM package for example, then I assign aunique identifier, then I tell the system I want to deposit ('publish') the content in a repository. How do I register the piece of content? I have to know I have the content, then I need to get its ID, then registering it tells the system that that piece of content is in the catalogue (doesn't move the content, just says it's there), then get the metadata for that content, then you put some metadata (not all, just enough) in the catalogue.

How do you search? You have search criteria, you pass them against the catalogue, the catalogue returns some identifiers and some data, that goes back to the application, then the application decides which one it wants, and that's the selected content.

How do I get something? I have a piece of content, by ID, then you go through a process of resolution, like a DNS, but a different system, like a handle system. You say, 'please resolve '100.xyx/cp' (100 is US gov't, 101 is UK gov't, we hope 102 will be Australian gov't). I use the repository registry to find out all about the access methods, everything I need to know about how to access the data. Then it's up to you to go and get it out of the rpository.

Repository registration: get the metadata about the repository, assign that an ID, drop that into the registry of repositories. Query, similar principle.

The ID inrastructure is based on this thing that's called 'handle', created by Bob Kahn. It does things that DNS doesn't do, for example, multiple resolution - you don't just get one adderss back, you get multiple, and you sort out which one you want. here are local namespaces for each implementation. Everybody sets up their own subnaming system. The handle '0' is the global handle system - if you ask for '0.na' it describes the root of the handle system.

Behaviours, services, etc: identification, authorization, authentication, digital rights, etc., all have to be worked out and all have to be defined in the system. Applications - people will be free to build whatever they want. Each system may build its own search, its own harvest, etc. CORDRA is layered model - each community gets its own implementation, can define it inependently of whomever runs the infrastrcuture. The core model is defined to represent anything that descrivbes other pieces. It describes how you describe roles, services, etc., it's one model that's used by everyone (we call this the '99' identier).

This brings us to federated CORDRA - we have all kinds of instances out there, how do they talk to each other? The idea is that we Federate CORDRA. So if I create a top-level repsoitory, I can register all the federations. Handle is a two-level structure, and CORDRA is designed is a two-level structure. You don't federate federations. You have to be a single federation, or you don't play. Single-level registration, and the top level that provides all the implementation details.

So, in summary: CORDRA is a reference model, not running code, an 'identifier system;, an architecture, with overall community implementations. The ADL prototype is operational and running, we are testing next week, the demo is scheduled for Orlando on December 7 or thereabout at a conference. The plan is to go live in January, maybe a CORDRA fest in February of 2005 in Australia.

More information: http://www.lsal.cmu.edu, http://lsal.org and http://cordra.org (coming soon)