Content-type: text/html ~ Stephen's Web ~ Resource Profiles

Stephen Downes

Knowledge, Learning, Community

Nov 22, 2003

This Journal Article published as Resource Profiles in Journal of Interactive Media in Education 2004(1), part. 6 Online May 20, 2004. [Link] [Info] [List all Publications]

1. Introduction

1.1 Abstract

The idea of a resource profile is that it is a multi-faceted, wide ranging description of a resource. A resource profile conforms to no particular XML schema, nor is it authored by any particular author. Additionally, unlike traditional resource descriptions, which are presumed to be instantiated as a single digital file and located in a particular place, a resource profile may be distributed, in pieces, across a large number of locations. And there is no single canonical or authoritative resource profile for a given resource. This paper describes the need for resource profiles, outlines their major conceptual properties, describes different types of constituent metadata, and examines the use of resource profiles in practice.

1.2 What is a Resource?

Much of the thinking and design behind the concepts outlined in this paper are based on the idea of learning objects. This paper deliberately abstracts from the more usual mode of discourse, not in order to introduce unnecessary ambiguity, but to capture some of the ambiguity already inherent in the concept of the learning object and to place it in a light where it may be examined without a predefined conception.

The term 'learning objects' is based on the merger of two distinct concepts, neither of which are universally endorsed by practitioners in the field. The first term, 'learning', seems to imply that the item in question must have some pedagogical value. (Magee and Friesen, 2001) But either a statement of this requires that a particular theoretical approach be presupposed, in which case proponents of different theories will not be in concord, or a tacit assumption of some common definition leaves the term so vague as to allow almost anything to qualify. The second term, 'object', presupposes a specific type of software entity, derived originally from the concept of object oriented programming, in which the resulting digital asset would support the concepts of inheritance, internal variables, and internal functions. (Downes, 2001) A great number of learning objects do not satisfy any of these criteria, and the original conception is now long lost in practice.

Instead, the approach taken in this paper will to discuss 'resources' generally, and it will be stipulated that a 'resource' may be anything that may be described in a 'resource profile'. This latter term is the subject of the paper as a whole, but in brief, what may be said of a resource profile is that it is an aggregate description of a resource. A 'resource', therefore, is anything that, for whatever reason, someone has found necessary or useful to describe, where the recommended structure for such descriptions is outlined in this paper.

The discussion and debate surrounding learning objects is but one instance of many attempts to identify what may be considered to be 'basic' or fundamental classes of resources. The term 'learning objects' presupposes, in other words, that resources may be divided into categories in two ways: 'learning' and 'non-learning'; 'objects' and 'non-objects'. A slight examination of the field suggests many more ways of classifying resources: 'digital' and 'non-digital' (IEEE, 2002), 'data' and 'metadata', 'text-based' and 'multimedia', and more. No doubt each of these distinctions will be useful within a given context. But it is by no means a straightforward matter to make such distinctions or to use them in a productive manner, much less obtain universal agreement that one, rather than the other, is a fundamental or essential categorization of resources.

For example, consider the distinction between 'data' and 'metadata', a commonly used and widely understood lexicon. How do we determine whether a given resource is a piece of data or a piece of metadata? It is said that metadata is 'data about data'. But metadata may itself be described, which is why the IEEE-LOM standard, for example, has a category titled 'meta-metadata'. Do we thence consider the metadata in the IEEE-LOM file to be data? (Bray, 2003) Obviously, there is a sense in which it is useful to think of it as data, and another sense in which it makes sense to think of it as metadata. This is a general issue. It is not possible to determine, based on the format or even the contents of a given file, whether it is a piece of data or a piece of metadata, because in a trivial sense, all data is 'about' something and can, in turn, have something that is 'about' it.

In this paper, therefore, no prior assumption is made regarding what may, or may not be, a resource, and no prior assumption is made regarding the structural, physical, or other characteristics of a resource. What makes something a resource is nothing more than the fact that somebody, at some time, considers it to be a resource. The definition of 'resources' thus offered in this paper is an ostensive definition: those things that we can and in fact do treat as resources, are what will be considered resources.

2. Describing Resources

2.1 Getting the Description Right

The purpose of this section of the paper is to state the problems to be addressed in the discussion to follow. We assume for the sake of argument that the purpose of a resource network is to enable people to be able to create, store, locate, retrieve resources. (IMS, 2003, Oliver, 2003) It is thus necessary at each stage of the process to be able in some way to distinguish one resource from another in a reliable manner; otherwise access to resources would be random. A common means of distinguishing items one from another is to give them a name, and this will be discussed below. However, while the practice of naming resources allows us avoid confusing them with each other, naming alone will not support the functions required of a resource network. If we had, say, only the names {'1','2'..., '10025452'} to work from, we would have no means of deciding whether resource '2' would be a better candidate for a given purpose than resource '3545'.

We need to describe resources, that is, we need to be able to associate the having (or not having) of a given property to a set of resources. At first, the practice of describing resources may appear to be simple and straightforward, however, when a system of description is pressed a bit it becomes evident that it is fraught with difficulties. To take a simple, suppose that resource '23255' is what we commonly call an 'apple'. The use of the term 'apple' is itself the beginning of a description; it places the resource into a specific category based on a certain set of properties presumed to be had by the resource, that it is a 'pome', for example, that it 'contains' a 'core' and 'seeds'. The use of this vocabulary in turn presupposes not only a set of logical relations ('is a type of', 'contains') but also a specific vocabulary generally agreed upon by a linguistic community.

Compounding the difficulties in assigning descriptions to resources is the expectation that the description will be 'right', that is, that the description we apply to a resource will in some way be 'true' or 'accurate' or even 'useful'. This requirement introduces a host of new issues to the description of objects, a factor that is compounded by the use of differing metrics for the evaluation of the 'rightness' of a description. Though the philosophical literature is replete with models and strategies, a short survey will be sufficient to make the point. On one theory, a description is 'right' if the object, in fact, has the property being described. This theory, however, leaves open the question of the description of fictional objects ('Narnia', 'unicorns') and the attribution of subjective properties ('beautiful','honest'). A second theory proposes that a description is 'right' if it coheres with a logical or linguistic structure of descriptions. This theory, however, leaves open the possibility of systemic error or theoretical bias ('phlogiston','drives'). A third approach requires that a distinction be 'useful'. This theory, however, begs the question of what counts as 'useful' (does it mean 'cash value', does it mean 'utility'?).

These larger questions will be set aside as essentially unsolvable. What this means, for all practical purposes, is that the system of description we adopt cannot presuppose any of three major sets of criteria: the vocabularies used to name either objects themselves or properties of objects; the set of logical relations between logics; and the standard of 'rightness' of a description. None of these are presupposed because there is no means to pick between one or another, and while we may each of us express preferences in our work and our day-to-day lives, it is only a remote possibility that we would ever reach consensus on any of them.

To draw out and illustrate this point, please allow me to expand on some major areas where the 'rightness' of a description poses significant problems for current approaches to learning object metadata. I would point out that these are difficulties that cannot be addressed through better practice; they are structural flaws in the current system employed to describe learning objects.

2.2 Multiple Descriptions

There is a presumption implicit in the structure of learning object metadata that there exists a one-to-one relationship between a 'learning object' and the metadata used to describe that object. Even the slightest examination of the nature of digital resources shows that this is not the case.

Technology now exists to take the same 'resource' and to output it in a variety of formats. The application software 'Cocoon', for example, uses as input resources described in XML and outputs instances of the resource in HTML, PDF, plain text, or any of a number of formats. (Levitt, 2000) Moreover, Cocoon will output, on request, either the entire content of an resource, or only partial representations of the resource. Thus, for example, we may obtain an HTML version of the full text of 'The Red Headed League' or we may obtain a PDF version of the outline of the Conan Doyle short story. Which of these constitutes 'the' resource? It should be clear that there is no correct answer to the question. In a related case, image archives often use the same digital contents to produce an 'image' and a 'thumbnail' of the image. Norman, 2003. Which of these constitutes 'the' resource?

The possibility that works may have distinct representations is already a matter that has been addressed by the publishing industry. In the FRBR standard, for example, a four-level description of published works is employed: a 'work' is realized through an 'expression', which is embodied in a 'manifestation', which is exemplified by an 'item'. (Madison, 1997, Oliver, 2003) Each of these, in turn, has a set of associated properties. A 'work', for example, will have a 'title', 'form', 'date', and more. In the FRBR, "A Work is an abstract entity; there is no single material object one can point to as the work. We recognize the work through individual realizations or expressions of the work, but the work itself exists only in the commonality of content between and among the various expressions of the work." (Oliver, 2003)

Another source for a multiplicity of description arises in the case of what may be 'subjective' descriptions. Take, for example, the Kevin Costner film, 'The Postman', widely derided by the critical press and described as the worst film of 1991. (Ryan, 1998.) The Razzies have their opinion; I have mine, and would rank 'The Postman' as one of the better films of the year. Leaving aside the question of which assessment is 'right', we have a case here in which two distinct descriptions exist for a single film, one in which the film is classified as 'worst' and another in which it is classified 'not worst'. It is clear that there can be no single value for any given subjective description, by definition.

Much of the metadata in IEEE-LOM could be classified as subjective metadata. IEEE-LOM 5.3, 'interactivity', is a measure that, without an agreed upon metric, "can only yield subjective entries from the developers of learning systems." Schulmeister, 2001 In addition, IEEE-LOM 5.4, 'semantic density', for example, is a "subjective measure of the resource's usefulness as compared to its size or duration." (Sutton, 1999) In any case where such a subjective assessment is called for (and there are many more), we are automatically presented with the possibility of differing descriptions for any given resource. One observer may describe a learning object (or a movie) as 'too complicated for average viewers', while another may say it is 'challenging but accessible'.

2.3 The Problem of Trust

A second major problem regarding the description of resources revolves around the assumption that the person or organization providing the description will be motivated to privide an accurate description. The history of metadata is not reassuring on this point, even when it comes to what may be construed as 'objective' accounts of resource properties.

The HTML standard included the option for developers to include in document heads 'Meta' tags in order to provide content descriptions. The purpose of Meta tags in HTML documents was (and remains) exactly the same as the purpose of contemporary metadata. Meta tags were used by search engines in order to locate and organize web contents. Their use proved to be an unmitigated failure.

In "Death Of A Meta Tag," for example, Danny Sullivan summarizes, "Experience with the tag has showed it to be a spam magnet. Some web site owners would insert misleading words about their pages or use excessive repetition of words in hopes of tricking the crawlers about relevancy." (Sullivan, 2002) And Andrew Goodman offers this assessment: "Metatags, as many in the industry are aware, were an early victim, succumbing to the opportunism of web site owners. Marketers, particularly operators of porn sites, which made up much of the money-making power of Internet commerce circa 1995, made search engines like Altavista look pretty silly. Search engines which looked at and took metatags seriously were riddled with spam (insincere pages which manipulated their metatags in order to rank higher in searches) until they began more aggressively filtering spam with increasingly sophisticated ranking methods and filters." (Goodman, 2002) As Cory Doctorow comments, "When poisoning the well confers benefits to the poisoners, the meta-waters get awfully toxic in short order." (Doctorow, 2001)

In the field of metadata proper, the signs of similar information pollution are beginning to be noticed. The author of the Paintball Channel on the Internet Topic Exchange, an index of RSS feeds organized by topic, complains for example that "some suckers are using this media to air their dirty spam." (Jotajota, 2003) And while some suggest that, due to spam-blockers and harvester filters, that RSS solves the spam crisis (Naraine, 2003), it should be evident that it does not. There is no guarantee inherent in the RSS format - or any XML format - that the information placed into the file will be accurate. As Kevin A. Burton writes, "RSS is not the solution to the spam problem. The solution to the spam problem is a distributed trust metric. The major problem here is that this would require a lot of overhaul to the existing email infrastructure." (Burton, 2003)

In the field of learning object metadata there exist numerous openings for resource providers to insert false or misleading data. This will become evident once the use of metadata to distribute commercial learning content for sale becomes more widespread. A common value for 'typical age range', for example, will be '2-99' (on how many games for sale in stores have we seen this already?). Categorizations will be needlessly broad. 'Interactivity' will always be 'high', even if the resource is a static web page. Should the range of learning object expand (as I will suggest below) and more overtly evaluative metadata be included, vendors will consistently rate their material as 'best', 'cheapest' and 'most effective'. While there is no doubt that there is a great deal of honesty in the academic community, there is just enough dishonesty to undermine a system of descriptive metadata based on trust.

Untrustworthy metadata is already beginning to be seen in learning object metadata. Friesen and Anderson (2003) report observing metadata descriptions that are "more promotional than descriptive." IEEE-LOM and similar metadata standards have no means of addressing this. The presumption behind IEEE-LOM seems to be that reliable content authors or professional indexers would create metadata, leaving normal human error as the only major cause of disinformation in learning object metadata. If this was the presumption, it was not well considered.

3. Resource Profiles

3.1 Overview of the Concept

The idea of a resource profile is that it is a multi-faceted, wide ranging description of a resource. A resource profile conforms to no particular XML schema, nor is it authored by any particular author. Additionally, unlike traditional resource descriptions, which are presumed to be instantiated as a single digital file and located in a particular place, a resource profile may be distributed, in pieces, across a large number of locations. And there is no single canonical or authoritative resource profile for a given resource.

The term 'profile' was chosen because it allows an easy analogy to be drawn between a resource profile and the profile that might be created of a person. The traditional resource description (such as a learning object metadata record) may be seen as similar to a person's resume or curriculum vitae. Typically authored by the person it describes, it contains some essential information and selected highlights from that person's career and volunteer life. But when, say, an investigative agency is trying to come to a complete understanding of a person, a resume would be only one piece of the puzzle. A large number of additional records would be consulted, such as the person's driver's license, driving history, academic transcripts, credit record, criminal record. Friends may be interviewed, a bill payments examined, mail on and offline about the person may be read. A much more complete picture - a profile - is constructed from these various sources.

The difference between the completeness and accuracy of the information obtained in a resume as compared to a personal profile is striking. While a resume consists of a small set of information and is authored by the person, a profile consists of a large set of information authored by many people. While the trustworthiness of a resume may be cast into question, particularly if the person has something to gain from a glowing report, the trustworthiness of a profile is much higher, because data are submitted by people with no particular stake, and because different claims may be correlated with each other and with the original resume. If we wished to consider someone for a teaching position, we would be much better guided by reference to a profile than a resume; even the most minimal scrutiny involves the checking of references, and a more thorough examination would review citations, reviews and other commentary regarding the person's work. The same reasoning applies when considering the selection of a learning resource: it is the profile, not the description, that will best meet the objectives set out above, of being able to to create, store, locate, and retrieve resources.

In this section of the paper we will look at some of the defining characteristics of resource profiles. In the next section, we will survey some of the major components of resource profiles. The final section will consider questions surrounding the generation of resource profile data and its organization into a metadata network.

3.2 Vocabularies

A major underlying principle of resource profiles, drawn from the Resource Description Format (RDF) [ref], is that resource profiles may be constructed from multiple vocabularies. Any statement within a resource profile is at its core what RDF calls a 'triple' having the following form: . The is the resource being described by the profile, and is generally assumed. Thus, a profile will contain statements of the form . In common parlance, the attribute is a metadata 'tag' while the value is the 'value' of that tag. Thus, in a metadata statement such as 'title:As You Like It', the attribute is 'title' and the value is 'As You Like It'.

The principle of multiple vocabularies has therefore two instances. The first instance is that multiple vocabularies may be used to define the range of possible attributes (tags). This is formalized in RDF through the use of 'namespaces' or schemas. The RDF schema "specifies mechanisms that may be used to name and describe properties and the classes of resource they describe." (W3C, 2003) The second instance is that multiple vocabularies may be used to define the range of possible objects (values). This is formalized in RDF through the use of 'ontologies'. "An explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them." (Paskin, 2003) In other words, "An ontology... is constituted by a specific vocabulary used to describe a certain reality, plus a set of explicit assumptions regarding the intended meaning of the vocabulary." Bechhofer, 2003, Introduction)

In practice, these two are typically combined. That is, the nature of the property may inherently define the set of possible values; this is part of the purpose of ontologies. For example, if we have a tag called then the range of possible values is clear: {'red','orange','blue'...}. But in many cases this is not (yet) defined, and in many cases, the relationship is not clear. Therefore, it is useful to think (at least conceptually) of the two types of vocabularies as being separate. So, suppose we had a tag such as red. The use of the tag is then specified by a schema, and the list of possible colours is obtained from a vocabulary. In general (still thinking conceptually), the format is . Extended, we could thus represent a statement as follows: spectrum:red.

By glossing over the technical details, we are able to extract from the preceding example the essential point: that a research profile is not confined to the use of only one schema or the use of only one vocabulary. This becomes clearer when we look at the profile of a person. It is clear that there are many ways to describe a person. So, too, with resources in general.

A person, for example, may have an 'appearance'. What we mean by 'appearance' is defined in a schema, and may include various properties such as 'colour', 'height', 'width', and more. But a person may also have an 'education', and which may include such properties as 'degrees', 'certificates', and 'workshops'. Not all schemas apply to all people. A driving person would having a 'driving record' while a non-driver would have none; a criminal person would have 'priors' while such a description makes no sense for the law-abiding. Similarly, a particular property may be described in a number of ways. A person's height, for example, may be described in terms of 'feet' and 'inches', or it may be defined in terms of 'centimeters'. Their 'identification number' may use the Canadian 'Social Insurance Number' or the American 'Social Security Number'.

None of these descriptions would be useful when describing a learning resource, of course; it does not even make sense to think of a learning resource as having a criminal record (only humans have criminal records, a fact that would, at some point, be recorded in an ontology). A learning resource might have a 'height' and a 'width', though, but typically only if it is an image; a text document does not have dimensional properties. While both people and learning resources may have a property called 'size', the person's size will be expressed as (say) a diameter, while an image's size will be expressed in bytes. Sometimes learning resources have more in common with people than they do with each other; a law professor and a law book may both have as a 'location' the Law Library, but a digital transcript will have as a 'location' only a URL.

It should be clear from this discussion, then, different sets of properties apply to different types of resources. Because there are many types of learning resources, it follows that learning resources ought to be described differently, with different sets of properties. An approach, therefore, such as that taken by IEEE-LOM, where every resource is described with a single set of properties, is inappropriate for this domain.

That said, IEEE-LOM already recognizes that there are different types of metadata. This may be seen by the division of the LOM into ten separate categories: general, lifecycle, technical, and like. IEEE, 2003 Each of these different categories may be viewed as being defined by a separate schema. This is in fact exactly the approach taken by Nilsson (2003) in the RDF binding of LOM metadata. Where appropriate, he replaces IEEE-LOM schema elements with Dublin Core elements. In RSS-LOM (Downes, 2003) different schemas defined as part of the RSS 1.0 protocol (Swartz, 2000) in order to create a combined format.

IEEE-LOM also allows for various vocabularies. The classification element, for example, contains two distinct components: a reference to the taxonomy being used, and the value of the current resource within that taxonomy. As the CanCore guidelines explain, "Classification element category is sophisticated and complex, providing elements for identifying and describing the purpose of the classification, the source, taxonomic value and identifier associated with the classification." (Friesen, 2003 The use of external vocabularies in IEEE-LOM is restricted, however. In a resource profile, the use of external vocabularies is unrestricted.

3.3 Authorship

Although in a certain sense a criminal is the author of his own misfortune, the authorship of the person's criminal record is not left to the person described, for the reason that such people will be motivated to falsely report their prior convictions. In a similar manner, a person's academic transcript is authored by the university registrar, and not the person being described. The same reasoning extends to description of other types of resources. Except in certain notable cases, movie reviews are not authored by movie studios, book reviews are penned by people other than the author. Some descriptions are not authored by a person at all. A person's power or water usage is recorded by a meter and fed directly into a central database, where it is used to issue power and water bills or to suggest targets for possible police investigations.

Learning object metadata files, however, are like many others assumed to have a single author. As we can see from the CanCore metadata guidelines, there is no provision for different authorship for different bits of information, save what (little) could be gleaned from the 'role'. (Friesen, 2003a) A learning resource profile, however, may have may authors. In principle, each statement within a learning profile could have a different author (though in practice, different authors will create different sets of tags).

The idea of attributing comments to authors is called 'reification'. Wikipedia (as of October 31, 2003) defines the concept: "In knowledge representation, reification is sometimes used to represent facts that must then be manipulated in some way, for example to compare logical assertions from different witnesses to determine their credibility. The message "John is six feet tall" is an assertion of truth that commits the sender to the fact, whereas the reified statement, "Mary reports that John is six feet tall" defers this commitment to Mary. In this way, the statements can be incompatible without creating contradictions in reasoning." Wikipedia, 2003 The concept of reification is explicitly discussed as such by Tim Berners-Lee. (Berners-Lee, 1999) And it is already instantiated in various semantic web implementations, such as Annotea. (Miller, 2003)

Tracking the authorship of metadata statements requires that author information be contained in the metadata. Author information must be contained in two places: in the first place, to designate the author of a given metadata file, which I'll call the 'metadata author'; and in the second place, to designate the author of tag contents, which I'll call the 'element author'. In IEEE-LOM, the metadata author is indicated in the metametadata. Other contributors may also be indicated in this area. Attributing element authors is not so straightforward; while Annotea describes the use of additional metadata tags, a more direct approach is preferred here: place a 'source' attribute within the tag pointing to the original metadata where the assertion was first made. Hence, for example, if we are depending on a second author for information about the resource's classification, we could describe it as follows: Information about the authorship of the classification metadata in this example would therefore be obtained by dereferencing the source and locating it within the metametadata.

In previous work I have referred to metadata authored in this way as 'third party metadata' [], the idea being that metadata authored by the resource creator is first party metadata and that authored by the resource consumer is second party metadata. This term has been used in other work, sometimes as 'third party annotation' (Bartlett, 2001) or 'third party labeling' (Eysenback, 2001). Recker and Wiley (2001) use the term 'nonauthoritative metadata' to describe third party metadata: "metadata that describe the variety of real world cases in which a given resource has been reused, what we have termed 'nonauthoritative metadata', can be extremely helpful in facilitating the efficient and effective reuse of existing resources." The term 'third party' is preferred here as while there is no doubt of the source, there may be, as suggested above, some doubt of the trustworthiness of first party metadata.

3.4 Distributed Metadata

Alluded to in passing in the previous section, this principle of resource profiles allows that the metadata for a given resource may be stored in different locations across the internet. That is, there is no single metadata file describing any given resource; metadata about the resource may be found in numerous online locations. A metadata profile is therefore constructed by aggregating the metadata available at these different locations in order to form a particular view of the resource. It follows that there may be different metadata profiles for a given resource, as different aggregators harvest different metadata from different locations, though one could define an ideal (and usually fictional) 'total' metadata profile composed of all possible metadata from all possible sources.

Again, this corresponds with the manner in which information about a person is distributed. A person's health records are stored at a hospital, their driving record at the Department of Motor Vehicles, their academic transcripts at a university, they birth information at a bureau of statistics, and the like. Very little information about a person is actually obtained from the person himself, usually only easily verifiable data such as the person's current address and telephone number. Even though a person may assert additional information in, say, a resume, this information is in fact subject to verification through reference to the originators of that information or through the production of certificates, such as a driver's license or university diploma, and not taken at face value.

In the world of learning resources, a very similar pattern may be expected and, indeed, has begun to take shape already. For example, the learning resource titled The Fugues of the Well-Tempered Clavier, by Timothy A. Smith and David Korevaar, is located in one place. (Smith, 2003) This resource has been reviewed by the MERLOT Music Review Panel, and the review is located in another place. (MERLOT, 2003) An aggregator seeking to obtain a complete profile of this resource may be therefore to obtain information from two separate locations in order to form a complete picture.

3.5 Resource Identifiers

In some discussion to follow, it will be seen that a resource cannot be identified by its location on the internet. A resource may take one of several technical forms, or a resource may be mirrored to lower distribution costs. Additionally, resource metadata may have no single internet location. Because metadata descriptions of a given resource may have different authors, and may be located in different places, there needs to be a means of knowing when two metadata resources are describing the same resource. It should be clear that the title of a work cannot serve as an identifier either. For example, the title of this paper is duplicated by a description of services available to senior citizens (Senior Citizen's Guide, 2003), an account of agriculture in Kyrgyzstan (Fitzherbert, 2000), and a mainframe applications utility. (Leroy, 2002.) This difficulty is resolved by means of a resource identifier.

The same sort of difficulties exist in the realm of personal identification. Though a person may have a name, just as a resource has a title, this name may be a duplicate. There exists a Stephen Downes who is a restaurant critic in Melbourne, a Stephen Downes who works for the National Research Council, a Stephen Downes who is a visual artist in New York, a Stephen Downes who is a professor of philosophy at the University of Utah, and a Stephen Downes who was an NDP candidate in Nova Scotia. Any individual's physical address may change over time, and other identifying information, such as website addess, email address, or phone number, may also change.

Organizations respond to this difficulty by assigning each person a unique identifier. Examples of identifiers in Canada include Social Insurance Numbers, health care numbers, and driver's license numbers. Additionally, organizations, such as universities, will also assign their own unique identifiers. What is common about these identification systems is that each identifier is unique, and each identifier is stored in a canonical location (which may be called a 'registry'). In turn, these identifiers are associated with (what may be) less permanent information about a person, such as the person's name or address. When a less permanent feature of a person changes, the person is required to update the registry with the new information. Mechanisms are in place in order to prevent the fraudulent change of a registry.

In the realm of digital resources, the idea of resource identifiers has been proposed on numerous occasions. Books, for example, may be identified by their ISBN (ISBN, 2003); serials by their ISSN. (ISSN, 2003) A prominent initiative, the Digital Object Identifier system, (DOI, 2003) "provides a framework for managing intellectual content, for linking customers with content suppliers, for facilitating electronic commerce, and enabling automated copyright management for all types of media." The DOI syntax is an ANSI standard, Z39.84, (NISO, 2000) and is defined in two parts: a prefix, which identifies the identity of the registration agency, and a suffix, which is the unique code assigned by that agency. (NISO, 2000) Because the DOI registration system is a commercial enterprise, however (OASIS, 2003), organizations such as eduSource have adopted their own format, but again with the same two-part structure.

There is from time to time a call for a single standard for digital resource identification, just as there is from time to time a similar demand for a single standard for identifying people. Over time, some such standard may become a de facto universal standard (as has Canada's Social Insurance Number for Canadians), however, such calls should be resisted. Organizations may find it more convenient to employ an internal identifier scheme, employing a public scheme only when the resource is published or made public. Additionally, the use of multiple identifier systems is more able to withstand a catastrophic corruption, as even if one registry is corrupted, reference to additional registries may be employed to establish the original identity.

3.6 Models

A model is an XML description that is used for multiple purposes. The purpose of a model is to store information in one place in order to allow it to be used in multiple places. A model functions in much the same way as a Cascading Style Sheet (CSS) (W3C, 2003a) A full definition of a given style is stored in a CSS file; the CSS style is imported by the web page in which the style will be used, and HTML in the web page invokes the style by referring to it by name. In resource profiles the second step is omitted; the external resource is involved and implemented within the body of the XML.

In a certain sense, models are already supported in RDF. For any given property value, instead of using a string to indicate the value, an XML author may instead refer to an external resource. For example, the 'creator' of a document may be 'Stephen Downes'. However, this reference is vague (there may be, as suggested above, other people named 'Stephen Downes') and it is incomplete (what is the current 'email address' for the author?). RDF allows the 'creator' of the document to be identified as an external 'resource' using the following syntax: (W3C, 1999) This is functionally equivalent to embedding vcard information into the XML (as proposed by IEEE-LOM).

In general, the use of external resources in this manner should be encouraged, and in reliable metadata networks, should be mandatory (conversely, XML which does not refer to external resources in this way should not be deemed trustworthy). The use of string data to refer to and describe external resources, such as authors and organizations, even if it is encoded in (say) vcard format, is fraught with danger. Such information will almost certainly change. Aside from the ambiguity of reference, pointed to above, people change email addresses and organizations (such as Docent and Click2Learn) merge and change names.

Because even the URLs of such resource metadata files will change over time, it is desirable when referring to an external resource to employ a permanent URL such as is provided by PURL (PURL, 2003) or a similar registry of resource locations. In such a case, the mechanism for referring to external resources would come to resemble (and, in fact, be a part of the same system as) the resource identification service described in the previous section. Hence, the reference to the external resource would be described in two parts: the name of the resource registry, and the unique identifier held by that registry. The registry, in turn, would either redirect an enquiry to the current location of the resource (as PURL does) or would return a set of metadata, which ought to include the location of that resource.

In a resource profile, a 'model' is employed in the same manner. A model has two parts: the name of the registry holding the model, and the unique identifier for the model. A model differs from an external resource, however, in that it is a partial metadata file and it does not describe any given resource. Rather, it describes a resource type, and the data contained in the model is intended to be descriptive of the current resource.

It is useful to think of resource models in much the same way we think of stereotypes as applied to people (but without the negative connotations). For example, if we have a person named 'Salty', we could add to the description of this person by invoking a specific model: 'sea captain'. Knowing that Salty is a sea captain immediately tells us many things about him: that he wears a captain's hat, that he has a peg leg, that he sings sea shanties. These details are not inferred (as would be the case with an ontology), these details are contained in the model itself. The model 'sea captain' just is the following XML: "captain'spegsea shanties". The model does not describe any person in particular, but when included as part of a resource profile, adds specific details to the description of the resource.

A model is used by a metadata author for several reasons. The use of metadata models may greatly simplify the creation of metadata. For example, in describing the digital rights associated with a resource, the associated ODRL file may run into several pages of detail. (W3C, 2002) However, if the relevant digital rights model is given a (recognizable) name, then this information may be very simply added to a metadata description.

A model may also be used to apply similar descriptions to multiple resources. For example, some properties of images offered by an image repository may be the same for each of 10,000 images: they may all be .gif images, 800 x 1400 pixels in dimension, with a colour depth of 16 bytes (or 256 colours). This information could be stored in a model called 'portrait', and then each of the 10,000 images could declare these technical specifications as a single line of XML code: .

A third reason to use a model is to withhold metadata that may be subject to subsequent change, while at the same time making the current value of this metadata available to aggregators. For example, the price of a resource offered by a commercial provider may change over time. If digital rights metadata is included in the resource metadata or content package metadata, as proposed by COLIS (Iannella, 2002), then the digital rights associated with an object cannot be changed. While this is useful (and necessary) for objects that have already been transacted, such a system is unsustainable for the delivery of rights metadata prior to the conclusion of a transaction. Once a resource is offered a $50.00, it would have to be offered at that price forever, since no reliable means would exist for changing the price once the metadata had been harvested by third parties.

3.7 The Concept in Retrospect

The use of metadata to describe learning resources is, in essence, an effort to create a distributed and integrated system of data management and application. The concept of the resource profile, as described immediately above, represents what could be viewed as a set of best practices for such enterprises. While on the one hand the details of the concept may be subject to further amendment and elaboration by those more familiar with the details of data management and application, they are nonetheless built on known and widely applied principles, principles that may be viewed in other applications of data management, but unfortunately, not to learning object metadata.

The metaphor of a system for the organization of personal information was used throughout for illustrative purposes, but the reference standard for the elaboration of the concept of resource profiles ought to be data management theory. Several of the properties of resource profiles described immediately above are instances of data management theory. In particular, the use of resources and models conforms to sound practices of database design and object oriented programming. The former corresponds with the principle of data normalization (Gilfillan, 2000), which could, in a nutshell, be expressed as a variant of Ockham's razor (Britannica, 2003): do not multiply entities without necessity. The use of a resource identifier corresponds with the requirement for the use of a primary key for all data; the use of external resources instead of strings corresponds to the requirement of the use of (what are sometimes called) lookup tables instead of manually entered referents.

The use of metedata models enables inheritance. (Sun, 2003) Inheritance is a common (even necessary) feature of object oriented programming. Not only does the use of inheritance reduce the complexity of applications programming, it reduces the possibility of error and eases the work required to maintain application integrity. Inheritance also facilitates the identification of groups or classes or objects, and allows developers to predict the behaviour of objects even when information about that behaviour is not present. In more practical terms, the use of inheritance is an instance of 'not reinventing the wheel'. To force metadata authors, even automated metadata authors, to input, say, author data over and over again is a violation of that principle on a massive scale.

Finally, the development of a distributed system of metadata authoring is an instance of the aphorism 'two heads are better than one' and draws from the design and architecture of the world wide web itself. Centralized and sole source information networks have been found to present insurmountable bottlenecks to the aggregation and distribution of data. (Shirkey, 2003) Even closed data management systems presuppose multiple authors; an examination of university data systems such as Banner or Colleague will show that, even though the data itself is centralized, authorship is distributed. In addition single point authoring of metadata has shown itself to be unusable in a world-wide network; this method, employed in the early days of Yahoo, has been superseded by services such as Google, which employ an aggregation rather than a data entry system.

The aggregation of information about a given resource from many sources has proven to be a formidible application. Google's Page rank system, for example, depends on what are here called third party resources. One aspect of this system is to rank a page according to the number of links to that page are contained in other pages. (Google, 2003) This provides a system of ordering search results which could not have been imagined using a system in which individual authors provide all and only the metadata describing their own pages.

The concept of the resource profile itself draws on numerous existing concepts in web design and metadata, including most clearly the Resource Description Format, but also Digital Object Identifiers and object registries, reification, FRBR, Annotea, and more. The concept described in these pages is not intended to replace any of these prior initiatives or specifications, but rather to draw on them and to convince readers to look at the concept of resource description using metadata from a different frame of reference. The most difficult part of designing metadata descriptions, including IEEE-LOM, lies in understanding exactly what it is we are trying to do, and a failure to grasp the wider picture leads to errors in specific implementations, such as the errors in IEEE-LOM that have been alluded to in passing in this paper.

4. Types of Metadata

4.1 On Types of Metadata in General

On the model suggested in the previous section, a new picture of resource metadata emerges. Instead of thinking of the metadata of a resource as a single, undifferentiated file composed of a set of standard elements, as we see for example in IEEE-LOM, it is more useful to think of the metadata for a given resource as a patchwork of metadata formats, assembled as needed to form a description most appropriate for the given resource.

Some reflection illustrates this. In some learning search services, such as the PEGGAsus project, (APEGGA, 2001) searchers are looking not only for online courses and programs, but also in-person synchronous events, such as seminars or conferences. While these are not learning objects, properly so-called, such resources should be describable in learning resource metadata. But seminars and conferences will have properties not shared by digital learning resources, a 'start time' for example. The same could be said of synchronous online conferences which, if properly designed, could qualify as learning resources.

One approach may be to simply assert that these sorts of learning resources are not described by learning object metadata. But this needlessly restricts the domain of such metadata, ensuring that no single search request could ever retrieve in the same set of results a seminar and a paper on the same topic. It is likely that as search services attempt to provide a wider range of services, they will in any case need to adapt learning object metadata. Thus viewed, such services will need a mechanism for representing all types of learning resources, not just learning objects.

In the description of types of resource metadata provided below, numerous instances of similar cases will be observed. It is hoped that the patchwork model of metadata will be well supported by example. Once the different types of metadata are enumerated, it will become evident that there are no good grounds for restricting the description of learning resource metadata to a single format.

An additional characteristic that should emerge relates to the authorship of metadata formats. Organizations, such as the IEEE-LTSC, which created IEEE-LOM, find themselves in the position of having to provide complete and wide-ranging descriptions of resource metadata. This is because they are trying to account for everything that might be needed in a single description. But one consquence of this is that they find themselves creating metadata formats in areas perhaps more suited to other authorities. An IEEE-LOM technical description for an image, for example, will look very different from the metadata format suggested by a body that specializes in image metadata.

4.2 Bibliographic Metadata

This first, and arguably core, type of metadata, Bibliographic Metadata, is related to resource authorship. It is metadata dealing with the creation, naming, publication and other intrinsic details related to the resource. In IEEE-LOM, this sort of metadata is (partially) described in the General category, and includes the identifier and catalog entries. (Friesen, 2003b) Other metadata of this type may be found in the IEEE-LOM Lifecycle category, where we find such information as version, status, contributors, and date. (Friesen, 2003c)

What should be characteristic of bibliographic metadata is that it is information intrinsic to the resource in question. That is, the fact of it being a resource implies that it would have an identifier of some sort, a title (in most cases), a creator, and a creation date. These are metadata that could only be reliably authored by the creator of the resource (or proxies of the creator, such as his or her company). Though a catalogue entry would be created externally by a registry, as described above, the creator would initiate this process and place the resulting catalogue information into the bibliographical metadata.

Numerous forms of bibliographic metadata exist. A widely accepted and standardized format is the Dublin Core. As mentioned above, Nilsson's RDF binding of IEEE-LOM replaces IEEE-LOM bibliographic metadata with dublin core metadata. More detailed bibliographic metadata is available for more specific types of resources. For books (including online books), the ONline Information eXchange (ONIX) format may be used. (Editeur, 2003) Magazine articles, both print and online, may be described using elements from Publishing Requirements for Industry Standard Metadata (PRISM). (PRISM, 2003) Online Journal articles can be described using CrossRef, (Crossref, 2003) an application of the Digital Object Identifier standard and which includes volume, issue and page numbers as part of the standard bibliographic metadata for an article. (Crossref, 2003a) Another is the FRBR specification, mentioned above. Numerous others exist. (IFLA, 2003)

It makes no sense to attempt to stuff resources with widely varying bibliographic needs into a single, non-bibliographic format such as IEEE-LOM. A learning resource profile makes no attempt to do so. Instead, it employs the bibliographic format appropriate to the resource. While at first this may seem to introduce a degree of chaos into learning resource identification, such an approach in the longer run engenders greater standards compliance, as it ensures that learning resources are described in the same manner as other resources of the same type. Moreover, it allows for a much more fine-grained description of bibliographic information, information that can be as detailed as the creator wishes to make it, and which can be abstracted as necessary for various applications.

4.3 Technical Metadata

Technical metadata describes a specific instantiation of a learning resource, what the FRBR standard would call a 'manifestation'. (Tillett, 2002, item 13) Technical metadata is given a separate section in IEEE-LOM (Friesen, 2003d) as is appropriate. Technical metadata is constrained to describing the specific physical properties of the resource. This includes such properties as the location (online or off) of the resource, it's data format, and other spatio-temporal properties.

It is important to keep in mind that a resource may have more than one instantiation. As D'Arcy Norman asks, (Norman, 2003a), "Is it best to have 3 technical elements, one for each format of a resource (a GIF, a JPG, a TIF, each with their own sizes and locations - this is my personal preference), or mashing them all into one technical element (with multiple formats, locations, etc... which one points to which?)." As mentioned above, numerous online resources can be delivered in multiple formats using applications such as Cocoon, and each of these formats will have a different technical description, which will include a different location.

Detailed technical metadata may be produced for different types of resources. Video metadata, for example, has numerous metadata requirements well beyond what may be expressed in IEEE-LOM, for example, frame information, script and transcript information, camera angles, lighting, scene sequencing, and more. (Hunter and Armstrong, 1999) Video metadata standards include Digital Imaging Group (DIG) 35, MPEG-7 and Video Development Initiative (ViDe). (VMC, 2002) The Digital Imaging Group also proposes metadata standards for still images (OASIS, 2002) as does NISO's Z39.87. (NISO, 2002) Technical metadata formats also exist for non-digital resources; for example, standards such as Friend of a Friend (FOAF) are beginning to be used to describe people, (FOAF, 2003) and events. (Miller, 2001)

It is clear that for many applications such detailed metadata will not be needed (though is the 'source' attribute, described above, it could still be located). For most applications, knowledge that a resource is of this or that mime type (such as text/html or image/jpeg) will be sufficient. (IANA, 2002) What should be noted, though, is that technical metadata should describe only the physical properties of the resource. It should not, as described in IEEE-LOM, specify specific applications or players. The difficulties of this approach can be seen in the numerous errors caused by Javascript browser detection scripts, which, by declaring that a resource requires a specific player, frequently make mistakes when newer unanticipated players are developed. Player information should be deduced from technical information, typically with reference to a centralized 'player - data type' database (much in the way a browser will associate different mime types with different plug-ins). (Netscape, 1996)

A resource profile, therefore, will contain one or more sets of technical metadata, each describing a specific instantiation of the resource. Such technical metadata ought to employ a metadata format or scheme appropriate to the physical properties of the resource instantiation being described. For reference to particular instantiations, it is useful to give each technical instantiation a name (for example, to associate different rights with different versions), though this is not essential.

4.4 Classification Metadata

The previous two types of metadata, although they may use varying schemas, have in common a single origin - typically the creator of the metadata - and are what might be properly called 'authoritative', in the sense that they are not the subject of opinion or disagreement. Such metadata also represent the only metadata that needs to be produced by the creator of the resource, and the only metadata in which the creator's assertion may be taken at face value. It should therefore be called 'first party metadata'.

The classification of learning resources, however, is another matter entirely. Though the author of a resource may offer suggestions as to the classification of a resource, typically through the 'keywords', 'coverage' and 'classification' elements, an author will seldom have the last word when it comes to classification. In the world of literature, classification is more typically undertaken by professional librarians or indexers with a close familiarity with one or another classification scheme.

Numerous classification schemes exist: "Dewey Decimal Classification (DDC); Universal Decimal Classification (UDC); Library of Congress Classification (LCC); Nederlandse Basisclassificatie (BC); Sveriges Allm?ma Biblioteksf?rening (SAB); Iconclass; National Library of Medicine (NLM); Engineering Information (Ei); Mathematics Subject Classification (MSC) and the ACM Computing Classification System (CCS). Projects which attempt to apply classification in automated services are also described including the Nordic WAIS/WWW Project, Project GERHARD and Project Scorpion." []

Any effort by the creator of a learning resource, or even a single indexer, to place a given resource into the appropriate place in all of these schemes would be futile. And such an effort could not even begin to organize resources into application-specific classification schemes, such as the list of 'topics' provided by Edu_RSS. (Downes, 2003a) Looking even further afield, classification schemes could (and should) organize resources not only by subject or topic, but by numerous other criteria, such as (to draw from IEEE-LOM) semantic density and interactivity.

The creation of classification schemes should be left to library organizations (who will determine global classification metrics) and professional or disciplinary bodies and organizations, who will provide more fine-grained classifications specific to a given domain or discipline. The creation of metadata describing actual classifications of resources should also be left to these agencies. In some cases, classification services will be provided by volunteer or governmental agencies, and will be available to all. In other cases, classifications will be made available as commercial services.

A resource profile for a given resource will typically contain one or more classification element. This element will escribe the scheme used, the placement of the location within the scheme, and the identity of the person or authority making the classification. Classifications of learning resources would not typically be placed into the metadata provided by the creator of the learning resource, since there is no reason to believe that the classification is accurate, but would rather be harvested by an aggregator directly from the classification agency itself. For this reason, classification is the first major instance of what should be called 'third party metadata'.

4.5 Evaluative Metadata

Though it may be considered to be a type of classification, the evaluation of resources is sufficiently important in the field of learning resource metadata to merit its own consideration. This is partially because of the long tradition of peer review of learning resources, and partially because evaluative metadata is uniquely illustrative of many of the concepts described in this paper.

It should go without saying that the creators and publishers of resources should not be left to evaluate their own work, particularly when there is a commercial incentive to a positive evaluation. Evaluation metadata is therefore canonical third party metadata. Not only could third party opinions be expressed, such opinions should be the only opinion expressed. It is unlikely that any aggregator would be interested in harvesting the vendors' own opinions about the quality of their products (unless they are an advertising flyer). Moreover, evaluations should not be associated with vendor-supplied metadata; in order to preserve its integrity, such metadata should only be available through third parties. And finally, it is reasonable to expect numerous evaluations of the same resource.

Though evaluative metadata is a new field, there is a rich history in the field of evaluation generally, rich enough to suggest that there will necessarily be numerous types of evaluation, and therefore varying schemas used to define the form of such evaluations. Stufflebeam's (1971) CIPP (context, input, process, product) model, for example, proposes metrics for different stages of resource production. Williams points to the need for different sorts of evaluations for different target groups: instructors, students, and instructional support people (we might also add instructional designers). (Williams, 2000) Evaluations may be the result of individual assessment, aggregated form-filling (such as at Hot or Not (Hot or Not, 2003)), or the result of a collaborative process. Nesbit, 2002)

Many evaluative schemes propose multiple dimensions of evaluation. For example, Nesbit, Belfer and Vargo propose a 10-part metric for learning resources, including aesthetics, design, accuracy, support for learning goals, and six others. MERLOT incorporates a three-section multi-facet metric. (Bennett and Metros, 2001) Different types of media require different types of evaluation; even this partial list is sufficient to show that a different metric is required for mammography images, including this very precise set for clinical images: "Positioning; Compression; Optical density; Sharpness; Contrast; Noise; Exam identification; Artifacts." (Michigan, 2002) And while evaluation is typically considered to be related to the quality of an object, other metrics, such as the resource's placement within Bloom's taxonomy, may also be applied. (McGee, 2003)

For all that, from the point of view of users, the most popular sort of evaluation is likely to be a one-dimensional evaluation using either numerical values or a bi-value 'thumbs up - thumbs down' metric in the style of Siskel and Ebert. (WCHS, 1998) Even so, such a simple evaluation metric is far beyond the capabilities of the learning object metadata proposed in IEEE-LOM. The metadata standard does not allow any fields for evaluation, and even were evaluation to be allowed among the classification or annotation (Friesen, 2003e) metadata, it is unlikely that content producers would be inclined to include a 'thumbs down' rating among their metadata. Evaluative metadata must be third party metadata, and with that, forces the use of the learning resource profile herein described rather than the single-source model proposed in IEEE-LOM.

4.6 Educational Metadata

One section of the IEEE-LOM standard is devoted to educational metadata. (Friesen, 2003f) included within this section are fields for learning resource type, interactivity, semantic density, end user role, typical age range, difficulty, and typical learning time.

It should be clear that educational metadata qualifies as third party metadata rather than first party. For as in the case of classification, while the content author may have an opinion about the educational use of an object, the author's opinion is unlikely to be the last word, as practitioners will over time be able to more accurately describe a resource's educational nature.

The definition, moreover, of educational metadata as third party metadata additionally allows many resources not designated as learning resources to be interpreted as such, vastly expanding the domain of potential learning resources. Many websites, academic articles, images, and the like were created for purposes other than education, and nonetheless have educational value. While the authors may have no inclination or interest (not to mention ability) to classify their work as an educational resource, such a classification could be undertaken by a third party.

It is worth mentioning that the definition of educational metadata ought to be the sole activity of groups such as IEEE-LTSC, which are well positioned to undertake such a project. As mentioned elsewhere in this paper, other types of metadata are more expertly handed by bodies and associations devoted to that type of subject material. And the definition of educational metadata even in IEEE-LOM should be more carefully defined and expanded. Criticisms in the CanCore documents of the IEEE-LOM account of interactivity and learning resource type, for example, are reflective of this. (Friesen, 2003f) By failing to clearly delineate between educational properties and (say) format properties, IEEE-LOM introduces ambiguity into the description of these elements. Does 'type' mean pedagogical type (lesson, lecture, quiz) or media format?

In practice, and as will be discussed below, it is likely that some educational metadata will not be expressed as either first party or third party metadata, but as 'second party metadata', that is, metadata defined by, and in the course of, the use of the resource. The best evidence that a resource is appropriate for a given age range, for example, is that teachers elect to use it for students of that age range. Data such as content, end user role, and typical learning type are data related explicitly to use, and therefore obtained through observations of use.

The supposition that some educational metadata may be second person metadata implies that a separate class of metadata be defined in order to define contexts of use. This, also, would be a fruitful project for IMS or IEEE-LOM. A complete description of educational context metadata is beyond the scope of this paper. However, it is important to assert that such metadata would describe not only the pedagogical content (class, subject, educational role) in which a resource is used, but also information about the user (age, education, grade level) and the technical environment. The idea here is that educational use metadata, conjoined with learning resource metadata, produce what may be called a 'use instance', collections of which may be harvested by metadata aggregators.

4.7 Sequencing and Relational Metadata

The Resource Description Framework (RDF) itself contains definitions of relational metadata, such as containers, 'see also', and 'defined by'. (W3C, 2003) Ontologies extend this capacity by defining such things as memberships in classes. Viewed formally, relations between any two entities may be defined. (SAS, 2003) This allows for the description of a wide varieties of relationships. CREAM (Creating RElational, Annotation-based Meta-data) relationship metadata, for example, can be used to describe the professor to graduate student relation. (Handschuh, 2001) Similar metadata describing relations among learning resources are both needed and necessary.

People have started to use relationship metadata to describe digital resources. One of the simplest non-ontological relationships is the reference or the citation. NEC's CiteSeer (NEC, 2003) is a good example of the collection of this information. In HTML, citations are indicated by links, and as mentioned above, Google's Pagerank collects (but does not distribute) such data. In the world of Rich Site Syndication (RSS) the collection and distribution of relational metadata is common; Technorati (Technorati, 2000) provides metadata feeds describing a resource's 'Link Cosmos'.

Relationship metadata may be used to sequence learning resources. The IMS Learning Design and Simple Sequencing specifications, for example, are essentially a system for sequencing learning objects. (Downes, 2003b, IMS, 2002) Without relationship information in learning object metadata, however, there is no means of locating learning resources for a given learning activity. (Downes, 2003b) Learning Design handles this by explicitly referring to specific learning objects, however, this method is not reusable. Significant progress could be achieved by capturing the placement of learning resources in learning designs as a type of learning object metadata; this would define a relationship between two resources placed into the same learning design, and would allow an automatic system to, when one is used, suggest the other.

More robust relations between learning resources can be described. While sequencing is, essentially, nothing more than an attempt to establish an order among learning resources, various (and as yet not proposed) relationship metadata could describe pedagogical relations, semantic relations, social relations, cultural relations, and more. Rather than taking into account mere sequencing, advanced learning design systems would have the capacity to associate, rank and suggest learning resources according to a wide variety of criteria, including those found not only in the resource metadata but also in the use instance.

Relational metadata is almost exclusively second party metadata. Though some content authors may desire to express the relationship between two learning resources, as in a series of lessons, for example, many relationships could not be imagined by their authors. Such relationships typically emerge as useful data from descriptions of use - the identification of actual links, actual references, actual placement in sequence, actual association with a certain task.

4.8 Interaction Metadata

Interaction metadata defines the types and nature of interactions that may be supported by a resource. That this sort of metadata is required is established by the observation that many resources are not static documents, but may be bits of programming, online services, or even people willing to perform a certain service or task.

There are two major types of interaction metadata, internal and external. Internal interaction metadata describes means of interacting with the resource itself. For example, a web service will support a certain set of requests. The nature of these requests and the parameters supported are defined using web services definition language (WSDL). (W3C, 2001) Or for example, a document may allow for certain types of customization by a reader, perhaps by allowing background colours to be set or logos to be inserted. Internal interaction metadata would define these.

External interaction metadata describes interactions that are supported by services related to the resource, but external to the resource. The RSS trackback system is a good example if this. In trackback, one person makes a resource available online. After a time, a second person creates a link to the resource. Trackback allows the second person to notify the first person that the link has been created; information about this new link will then be associated with the original resource. Trackback metadata associated with a resource tells the second person where to 'ping' the first person. Al-Muhajabah, 2003

In general, an internal interaction will be used only to modify a representation of a resource, not the original resource itself (the most significant exception is the Wiki). Internal interaction is therefore essential to the customization and personalization of resources. External interaction, on the other hand, is used almost exclusively to modify the resource metadata. External interaction is therefore essential to the description of a resource.

4.9 Rights Metadata

Rights metadata is in one sense first party metadata, since no entity other than the creator may specify the usage conditions for a particular resource, and at the same time third party metadata, because as described above rights are best described via a model hosted by a third party.

A great deal of effort has been undertaken in the area of rights metadata. For learning resources the major rights metadata formats are Open Digital Rights Language (which handles metadata modeling very easily) (Iannella, 2003) and MPEG-REL (Rightscom, 2003), formerly called XrML, developed by ContentGuard. A functioning example of the use of rights metadata models is the Creative Commons system, which allows web page developers to specify use conditions by pointing to the appropriate model. (CreativeCommons, 2003)

Rights metadata will be supported by a stand-alone profile management services that act as a broker for personal information on the internet. Because many such services exist, a person can choose a local and trusted broker (one they can sue if things go wrong). Because such services are working on behalf of the user (and may be paid for by the user, though no doubt free brokers will exists as a public service) there is no vested interest in ownership of the information: it is to the broker's advantage to ensure that the user owns his or her information. (Downes, 2002)

4.9 Metadata Types in Retrospect

The survey of metadata types just completed should be sufficient to show that many different types of metadata are appropriate for different types of resources. Different types of metadata may also be employed to serve different user groups and to accomplish various tasks. It should also be evident that learning resource metadata should not be authored by a single entity.

The metadata types listed above were grouped into three major categories. First party metadata describes properties intrinsic to the learning resource, properties about which there should be no dispute or reason to question the author, and properties most often described by the author of the resource. Second party metadata is related to the context of use, and while not (necessarily) authored by the user of a resource, describes the circumstances in which the resource was used. Third party metadata represents the opinions of those not directly related to the creation or use of metadata, and generally expresses an opinion, such as to the classification or quality of the resource.

Only some types of metadata schemas should be defined by learning metadata standards bodies, such as IEEE-LTSC. In particular, the scope of learning object metadata schemas should be restricted to information related directly to learning itself. The definition of metadata formats for bibliographic metadata, technical metadata, classification, and rights should be left to their respective expert groups. Some sequencing and evaluative metadata schemes could also be authored by the LTSC, but these efforts should be restricted to specifically educational metadata descriptions.

Though it may appear at first that this paper proposes a metadata chaos instead of the harmony of a single educational metadata standard, it should be observed that the use of resource profiles as described promotes greater, not less, standardization. Education is only one discipline among many. If each discipline attempts to define metadata standards beyond its pale, many needless and duplicate standards will result. Defining resources in terms appropriate to the resource moreover promotes interoperability not only within the discipline but also across disciplines.

5. Using Resource Profiles

5.1 The Lifecycle of the Learning Resource

When a child is born, it begins life with only the most minimal of metadata. It will have some bibliographic metadata: a name, some creators, and perhaps a hospital identification number. And it will have some technical information: height, weight (in ounces), a home address. Very little more, in fact, is placed in the birth notice. As the child grows and acts in the world, metadata accumulates. It acquires a track record of achievements and certifications. It begins to be discussed by other people, perhaps reviewed, perhaps recommended.

Nobody would expect the parents of a child to enter its complete life metadata into a hospital form, and so too with learning objects. When a new resource is created, it may be released to the world with only the most basic metadata: a title, catalogue number, author and owner information, a description, perhaps, and technical information. The author selects rights information by selecting a rights model from a dropdown list, and this information, and with it, the resource, is flung into the world, captured by metadata harvesters, and the new resource begins life in various "What's New" reports and daily listings.

The first round of harvesting is attended by the first round of classification, as automatic sorting systems (such as that used by Edu_RSS topcs (Downes, 2003a) place the resource metadata into topical feeds. As the network develops, more advanced Baysean categorization algorithms are used. (Udell, 2003) As the resource is noticed and read it may begin to attract some discussion. One resource rating sites (such as Edu_RSS Ratings (Downes, 2003c)) readers may submit preliminary reviews in the form of a one-to-ten rating. Other people add to its record by linking to it. Readership, links and rating propel the resource higher on certain resource rankings, it is shared within and across disciplines (Levine, 2002) and it begins to be noticed by a wider audience.

Now considered a serious learning resource, easily separable by use and popularity from the masses of one-off magazine articles and hobbyist web pages, it catches the notice of an academic and is given a formal academic review. This vaults it into a new category of recognition; in short order it is certified by a professional association. This new metadata is added to its profile and vaults it into the search results obtained by instructors and designers. After being used in-class a few times, to favorable student rankings, the resource is selected for inclusion in a course package - an instructional drags it from the search results into a design template. This deployment produces a wealth of information, as the resource is now clearly associated by use with a subject, grade level, educational activity and more.

By now the resource is well established, enjoying the prime of its career. It is easily recommended by search services responding to requests for just this type of resource, is incorporated by the less personalized but more efficient automatic learning design programs. Over time, strongly associated with a set of similar resources, it becomes a part of a cluster of items that represent the canonical learning materials for this particular field. Until, gradually, inevitably, its age begins to show. The evaluations begin to decline, and more instructors begin selecting the new, hot, version 2.0, and after a useful life, serving tens of thousands of students, the resource becomes a part of the internet archive.

5.2 Generating Resource Profiles

Since the release of the first IMS learning object metadata protocols there have been concerns and complaints about the number of fields indexers are required to complete. (Monthienvi, 2001 for example) What the lifecycle story just completed should show is that very little manual entry of metadata is required. Moreover, it should also show that the task of authoring learning resource metadata is widely distributed, undertaken by a variety of volunteer, professional and commercial agencies.

The key metadata creation is the bibliographic information authored when the resource is created. Such metadata is created as a part of the authoring process; the title, abstract and author information are gathered by the authoring software en passant and, in the way blogging software organizes personal website contents, automatically generates the initial metadata. (Gillmor, 2003) More advanced authoring software gathers the technical information describing the resource; no personal intervention should ever be required to record the fact that a document is XML, HTML or PDF, for example.

Though much effort has been dedicated toward the categorization of learning and other resources, it is likely that the bulk of such work will be handled by increasingly sophisticated filtering services. Even Edu_RSS's basis system, using Perl regular expressions (Franklin, 1994) to define categories, achieves a high degree of precision. Already deployed widely as an anti-spam system, systems using Bayesian probability metrics are being widely considered as classifiers. (Karieauskas, 2002) Finally, neural network software is widely considered to be uniquely capable of identifying categories in large clusters of resources. (Ruiz, 2003)

Potentially the most useful metadata will be created though use. It is arguable that the best determination of the proper classification and description of the resource is obtained via contextual information. (Downes, 2003d) From a practical standpoint, this implies the employment of context-aware learning resource viewers and management systems. When a resource is selected for use in a given context, this use is captured and the information placed in an accessible metadata file. The appropriate classification of the resource may then be determined as a function of the aggregated contextually-generated metadata.

5.3 The Metadata Distribution Network

Some words are necessary regarding the organization of a system for harvesting and using resource profile metadata, as it is not evident at first glance how a system deploying multiple metadata formats, multiple servers and multiple authors may be structured, much less how such a system may actually promote, rather than hinder, interoperability.

It is perhaps most common to think of such a network as completely connected, that is, everybody accesses everything. Such a system, as a searcher's experience with Google may suggest, can be overwhelming. Moreover, it is nearly impossible to achieve precision when everything is interconnected; setting 'Google Alert' (Google, 2003a) to return new references to 'RSS', for example, results in my receiving mostly items about Indian politics, and not the material on Rich Site Summary required. Thus, in addition to thinking of the network of resource profiles as a distribution system, it is also necessary to think of it as a filtering system. The obtaining of finely grained search results is a consequence of decisions made by aggregators and harvesters at various points in the system.

As mentioned above, resources are initially entered into the system using first party metadata. This metadata is created as the file is authored and is made available on the resource owner's web server or resource repository. It is then harvested by a harvester, and at this point the first filtering decision is made: not every aggregator harvests every resource metadata. Out of the thirty thousand or so RSS feeds available, for example, Edu_RSS harvests about 200. This makes Edu_RSS a highly selective filter, even before the first resource is even seen, of metadata content.

As metadata is received from the 200 repositories, it is evaluated by Edu_RSS. It arrives in a variety of formats (five different version of RSS, various RSS modules, Atom, and Dublin Core). Each of these feeds is translated using XSLT into an internal format unique to Edu_RSS. Not all information contained in the original feed is stored, only that which is relevant to Edu_RSS (which allows a source metadata file to specifiy 18 technical parameters, and for Edu_RSS to store, simply, 'image/gif'). Edu_RSS also adds to the metadata en passant; its major addition is to categorize the resource, as described above, but it also adds date, author and source information as appropriate. Edu_RSS may optionally reject metadata records that, say, fall outside its categorization criteria. This is the second layer of filtering.

What is important to note is that Edu_RSS does not act alone as an aggregator. It operates alongside dozens, maybe hundreds, or aggregators, each dedicated to a specific niche. While some aggregators, such as Technorati, NewsIsFree, Daypop and more, aggregate from the entire list of weblogs, others are dedicated to specific topics or specific types of data (such as images or videos). Though the majority of RSS readers today read individual channels, because of the volume of material it is likely that people will begin to read selected feeds provided by aggregators (already, services such as DayPop's 'Top 40' are among the most popular). As resources are release they are reviewed. Any of hundreds of reviewers may take part in this process, and each reviewer may select certain types of resources and employ their own research criteria. These reviews, also made available as metadata, are harvested in exactly the same manner as first party metadata and are, using XSLT again, joined to the original metadata record. At this point, the collected metadata for a particular resource begins to resemble a research profile. Crucially (unlike peer reviewing as practiced by academic journals), these reviews are not themselevs a filtering mechanism, however, they may optionally be used as input for the next layer of filtering.

Edu_RSS and other aggregators offer output feeds or harvested and organized metadata. These output feeds include not only first party metadata but also categorization information and aggregated review information. Metadata for the same resource may vary from aggregator to aggregator, depending on the categorization and evaluation mechanisms employed. Typically, an aggregator will offer numerous feeds, dedicated to specific subject areas, specific authors, specific dates, search results, and so on.

By specifying a query to the aggregator, the third layer of filtering is deployed. This third layer applies user preferences against the already filtered metadata offered by the aggregator. Most often a user will request metadata on a certain subject or as a certain search result. But the user may at this point include numerous additional criteria, on either a case-by-case or default basis. For example, they may require that resources displayed have achieved a certain evaluation value, have obtained a certain certification, be associated with certain digital rights, be authored in a certain language, or, for that matter, satisfy a given range for any of the metadata values supported by the aggregator.

Aggregators such as Edu_RSS retrieve the requisite metadata, and in one final transformation process, convert it using XSLT into the format requested by the searcher. Hence, for example, Edu_RSS outputs metadata in the five RSS formats and Atom, as well as in plain text, HTML, Javascript, or (planned) email or web services. At least one of these formats will be compliant with virtually any application the searcher is using (and typically, the searcher would not even concern himself about the reply format). This is end-to-end standards compliance: no matter what format the resource provider uses to express resource metadata, the user is able to use it trasnparently in his or her own application. The hard work is performed by the intermediary services that harvest, transform, and deliver the search results.

5.4 Projected Metadata

It should be clear that many resources will have very little associated metadata, especially when they are first deployed. How is it possible to determine the classification or the value of an object, for example, when it has never been used?

The analysis of existing metadata permits the extrapolation of unknown metadata from existing metadata. For example, knowing that the author of a certain piece is 'David Wiley' it would reasonable to infer that the resource will be about learning objects or a related topic. It is, by contrast, unlikely to be a treatise on advanced biochemistry. In a similar manner, estimations of the likely quality of an object may be inferred from existing metadata. A learning resource authored by 'Joe Schmoe' may be predicted to be of low quality based on existing evaluations of Schmoe's earlier work.

The simplest form of predictive metadata is based on averages. In a given metadata collection, all objects with a certain metadata value (author='Joe Schmoe', for example) are considered. The evaluation values for these objects are added and then divided by the number of objects. This produces an average; the average is then projected to be the evaluation value for a new, as yet unevaluated, object. More sophisticated forms of prediction include combinations of factors (author='Joe Schmoe' and topic='biochemistry') and conditional probabilities.

Since the projection of metadata is essentially an associative activity, more complex relations between input values and predicted values may be derived using neural nets. In such a case, unanticipated relations may be found and be employed to form background hypothesis. These background hypotheses are then weighted in combination with specific predictions. For example, while David Wiley may be predicted to write good resources about metadata, we have no way of knowing how well he would write about object sequencing. But if we know that people who write well about metadata also write well about sequencing, then we can apply this generalization to the more specific case.

It is worth noting that projected metadata is not of much use without the expanded metadata set available with resource profiles. Though projections could be made within the confines of IEEE-LOM, such projections will not be of a great deal of use. The more subjective the metadata property, the more useful projected metadata will become, because it is subjective metadata that is the most difficult to collect and the most useful in application.

Because projected metadata is formed by hypothesis, it is important that users be aware of this status (hence, reification is doubly important), and it is important to build corrective measures. Obviously, one review does not over-rule a projection based on a substantial body of evidence, but as the number of actual values is increased, the importance of the projected metadata should decrease. If the projected metadata is appropriatedly formed, then the actual evaluations should trend toward the projected values; if they do not, this information should be used to correct the projection algorithm.

5.5 Data Network Properties

From above, we recall that the purpose of a metadata system is to enable users to create, store, locate, and retrieve resources. The concept of resource profiles has a bearing on each of these processes. This bearing is perhaps best illustrated by sketching two major types of resource distribution network. It will be argued here that the adoption of resource profiles favours the use of a harvesting (or distributed) network, rather than a federated system.

The distinction is at times an elusive one. Stephen Lanahas argues, for example, that the internet is actually a federated system, and not a distributed system as is usually presupposed. "The word 'distributed' implies that one system will be spread out to different locations. Federated supports the sense that we're dealing with a host of systems, (which in truth are not really components if they can stand alone). These systems form a "cooperative" community which can be morphed into many manifestations simply by inclusion or exclusion of end-user configuration / access to them." (Lananas, 2002)

From the point of view of applications, Lanahas is without doubt correct. Very few applications are distributed (although grid computing poses an interesting counterexample). Services, such as web servers, are stand alone, and a user will use these services sequentially, access being granted on a case by case basis. "What we see in any viable global architecture is the need to segregate application functionality and provide efficient data flows between them," argues Lanahas. This runs contrary to current trends in e-learning. He observes, "most people in the information technology arena have been focused on delivering tight integrations for the past decade or more." Thus, for example, a login accepted in one system is carried (through Shibboleth, say) to another system.

From another point of view, however, the internet may be viewed as a massively distributed system. Data is not stored in one place, but is spread across the internet. Information about a given person, for example, may be found on dozens of different websites or data servers. This, too, is contrary to current trends in learning technology. What has emerged, especially in the area of metadata repositories, is what I have characterized as a 'silo' model of information management. (Downes, 2003e) Tight interoperability between applications is necessary because applications must interact with each other in order to use combinations of data.

Hence, for example, until very recently a metadata repository such as MERLOT refused to share its metadata (and even today, only shares a small subset of it). Learning Content Management Systems do not share data at all, depending instead on an internal database system, or 'library', of learning content. The contents of these libraries much be specially designed to interoperate with each other; hence the advent of detailed specifications such as SCORM.

The dangers of tightly integrated applications depending on narrowly defined data should be evident (and if not, can be drawn by analogy from Microsoft's approach to the desktop environment, which operates according to a similar paradigm). First, such a system is increasingly vulnerable to malfunction or attack. Because systems are tightly coupled, a flaw in one becomes a flaw in all. Hence, in Windows, a buffer overflow error in MSN Messenger can be used to compromise user login or data access routines. In a common login system, such as Shibboleth, an unauthorized login in to one system exposes all connected systems to attack.

Second, interoperability becomes increasingly difficult. The need at all for events such as 'PlugFests' should illustrate the danger here. It becomes almost impossible for new players to enter such a network, as the technical overhead becomes impossible to manage. Innovation becomes increasingly difficult as interoperability constrains what may be done. Such a system tends, as we see again with the Microsoft analogy, to favour a single, monolithic system. Vendor lock-in becomes common, and consequently prices increases as the cost of migration increases.

Third, a tight coupling between applications limits the range of data that any given application can manage, and any data in the system must be usable by all applications in the system. This results in the creation of unnecessarily complex data formats (the current specification for a SCORM compliant learning object serves as an example), with a significant danger of this data format becoming proprietary and inaccessible. That one cannot read MS Word documents without a Microsoft-compliant product is an example of this danger; that one cannot read a PDF document with Acrobat Reader is an example. Products that unscramble these formats (such as are now available in Linux) become increasingly difficult to develop, and as of the DMCA are illegal.

(As a parenthetical remark: recent developments in the RSS community, involving the use of pinging and trackbacks [ref], pose just this sort of danger, and should be avoided. Though they offer a promise of greater efficiency, they needlessly restrict the scope of the network, pose barriers against new application development, and introcuce new vulnerabilities, as the recent 'Lolita' spam demonstrated. (Trott, 2003))

The use of a loose, variable format such as is described in this paper, argues against tightly integrated applications. Any given application, because it is stand-alone, can read and create as much, or as little, metadata as it required. An application may be a part of the network without instantiating all properties of the network. For example, the proposed DRI specification suggests that any and all repository support the 'search' function, an essential requirement in a federated system. This compels any repository to support a considerable overhead. But if data flows freely, search can be handled by specialized applications (such as aggregators), relieving smaller repositories of this burden.

Much of the potential described in this paper is simply not possible in a federated system. Distributed evaluation metadata and contextual metadata are captured in many places. It is not clear how a searcher could rank learning object search results according to such criteria, since they will not be present in the metadata made available by the learning resource provider (one might suggest that such metadata be reported to the originator, but it immediately becomes suspect, as the owners of a resource are very unlikely to accept and pass on metadata reporting that its resource should not be used).

5.6 Interoperability

Many of the arguments for a single metadata standard or for a tightly integrated application network are based on a premise of increased interoperability. By allowing many metadata standards, and by decoupling applications, it is argued, we will return to the days where data produced on one computer could not be used on another computer.

It is indisputable that agreement on standards is necessary to promote interoperability. However, it is open to argument as to just what these standards should describe. Throughout this paper, a set of standards applicable to all applications and all data formats has been assumed: minimally, XML, and for wider functionality, RDF. These standards may be described as low-level standards; they are not specific to any type of data or any particular domain or discipline. What is key is that they are, insofar as possible, semantically neutral.

The standards deing defended, though, are much higher level. They propose domain-specific metadata, application program interfaces, common object definitions, common taxonomies. They are not semantically neutral; indeed, they are often explicitly defined to propose a semantics, to allow people not only to use the same words when they communicate but to mean the same thing by those words.

No doubt there is utility in commonality of meaning; otherwise aircraft advertised as headed for London would find themselves landing in Tokyo. But agreement on meaning is not something that can be, or should be, stipulated in standards. This becomes particularly evident when the domain of discourse becomes less objective and more subjective. In such cases, a commonality of vocabulary denotes an agreement, a voluntary compact entered into in order to indicate an affiliation and common purpose. They are not proscribed, they are subscribed.

Interoperability need not be world-wide and universal; it may function according to community. Just as there is no need for biochemists to describe the pedagogical properties of a research article, so also two differening schools of thought may disagree on the classification of such a document. In times of great change, as Kuhn observes, such vocabularies may even become incommensurable.

Crucially, then, at a certain level, interoperability is not - and cannot be - a property of the resource. With respect to the meanings of words, interoperability is a property of the reader (after all, a word such as 'cat' does not inherently contain its own denotation; it must be interpreted, and against a conceptual background, a denotation derived). In a similar manner, with respect to the meaning of metadata (and other properties) of a resource, interoperability is and must be a property of the reader application.

Consider what the web would have looked like were we to require that all web pages be 'interoperable'. At the time of the deployment of the web, we would have had to create MS Word and Word Perfect versions, there being a word processor standards war on at the time. Many important featuires of the web, such as hyperlinks and plug-ins, would have been impossible. Web pages, instead of averaging only a few bytes, would have been much larger, making the web itself almost unusable. But worst of all, the resulting network would not have been any more interoperable than the one that did, in fact, develop.

The success of an interoperable network is based on netrality at the centre and robustness at the edges. We did not build one road network for Toyotas and another for Fords. We did not create one telephone network for business calls and another for personal use. The same hold true for learning resources. The more rigidly we define learning resources, and the more rigidly we define the tools that transport them, the less interoperable such a network becomes. Already we need special tools to convert Word documents ot SCORM compliant learning objects [ref], an application which otherwise performs no useful function.

6. Concluding Remarks

6.1 The Future of Metadata

The science of metadata has been traditionally depicted as ordering the unordered, that "the purpose of metadata is to impose some order in a disordered information universe." (Lagoze, 2003) For the most part, however, this objective is misplaced. This is not because the desire to order the universe is misplaced; indeed, without the order inherent in natural laws and classifications the universe could not be comprehended at all. Rather, it is because the task of ordering information is best understood as something that is not accomplished in the creation of information, but rather, in the use of information. And the use of information is something that, like its object, almost defies order.

The central thread running through the concepts and mechanisms described in this paper is the recognition that the ordering of the universe, if it is to be accomplished at all, will not be accomplished in one place, in one way, or by one person. It is a recognition that a resource, like the proverbial elephant, may be viewed from different perspectives by different people. This is especially the case in more practical environments: a person buying an elephant, or seeking to use an elephant to pull a cart, will be interested only in a narrow set of properties, properties that might even be satisfied by certain oxen or horses better than some other elephants.

The second major thread running through this paper is the idea that, in order to be useful, these myriad descriptions must be communicated and connected one to the other. The idea is that, although there is no single common system of description, neither are there millions of individual descriptions. One person's description of a resource may have a great deal in common with another's, and these descriptions could usefully be clustered. groups of people with a similar perspective on a resource will adopt a similar vocabulary. Hence the need for a two-way flow of description, to enable people with such common interests to draw from and support each other.

This essay is a description of the technical and conceptual infrastructure underlying a system of metadata that adheres to these two threads. As mentioned above, it attempts to employ existing protocols and processes rather than redefine the concept of resource profiles from scratch. That this is possible without major modifications to any of the existing protocols and processes described shows that, to a significant degree, the properties essential to the creation of a resource profiles network have already begun to be embedded in the metadata network. However, until the nature of resource profiles is widely understood and widely shared by practitioners, these initiatives will continue to operate in silos, in isolation from each other, and the longer term benefits of metadata will not be realized.

6.2 The Intelligent Network

One might ask, what are the longer term benefits of metadata? Where is the payoff? Near the beginning of this paper, it was suggested that the purpose of metadata was to enable people to be able to create, store, locate and retrieve resources. In this final section we will look at how a network as described above realizes these objectives.

A great deal has been written about applications and systems that will use metadata in order to accomplish, say, the task of searching for resources online. Some authors, for example, propose that intelligent agents will work with metadata in order to organize and filter online information. "Resource discovery by agents can enable qualitatively more flexible applications than those in existence today, due to the fact that systems can be built to intelligently react to situations and environment not known at the time of system design." (Lassila, 1997)

The use of intelligent agents, however, simply places on computer software the onus to perform tasks that humans have thus far not been able to do. There is no reason to suppose that agents will be more successful, because agents will face the same problems humans do. There are too many resources to search, too many possible interactions, uncertainties in vocabulary, and trust issues. If the organization of information remains unchanged, agents will have no more success than humans. But conversely, if the organization is modified, then humans themselves may be able to perform the tasks previously assigned to agents.

To understand how this is possible, it is necessary to shift one's point of view from the idea that the network of information needs to be organized to the idea that what we want is a self-organizing network of information. That is not to say that no human intervention is required: people will, of course, have to create resources, describe resources, and use resources. But it is to say that the impossible task of organizing, sorting, filtering and retrieving these resources will be performed not by agents working on the network, but by the network itself.

We are already familiar with self-organizing networks. The human brain is one such system: constituted of billions of interconnected neural cells responding to and comprehending myriad sensory input, the human brain, with no particular design or program (and certainly no homonculi) manages to arrange all that data into an understanding of the world. (Loder, 1996) The study of the functioning of the human brain has led to the development of neural networks as a theory of computation. Today, connectionist systems are widely understood and studied, and though they have evolved far beyond their original biological basis, the fundamental principles remail constant.

The first principle of neural network design is that it is a form of distributed processing. No one node, no one neuron, corresponds to a macro phenomenon such as 'understanding' or 'our idea of the city of Paris'. Each neuron, by itself, with only a partial understanding of the process, manages only one aspect of the total function or concept. And the second major principle is connectivity. Neurons send information to each other, not at random, but as input to layers of additional neurons. Thus, for example, in the human visual processing system we observe layers of interconnected neurons performing the task of resolving random visual data into what Marr called the "2 1/2 dimensional sketch". (Glennerster, 2002)

The network of resource metadata described in this paper enulates the neural network. Layers of raw, disorganized input are provided by resource creators. This information flows, via aggregation, to a secondary layer, which performs a preliminary sort and filtering. Metadata may flow through additional layers as necessary. Finally, it reaches the output layer, where the resources are used. Data from the use and through what neural network theorists would call 'back propogation' this usage metadata is used to fine tune the connections and processing in the resource network. The result is that no individual or organization 'organizes' the network; it organizes itself.

How do we know this will work? We know, because it does work: it works in human cognition, and it works in artificially developed neural networks. Moreover, we have seen evidence of it working already on the web, through such phenomena as PageRank and blogging networks. The self-organizing network is not merely a pipe-dream, it is here already, and to see it those working in the field need only perform that hardest of all tasks, to recognize it.


Al-Muhajabah, 2003. What Is Trackback? Al-Muhajabah's Islamic Pages, 2003.

APEGGA, 2001. PEGGAsus. The Association of professional Engineers, Geologists and Geophysicists of Alberta. November 24, 2003.

Bartlett, 2001. Backlash vs. Third-Party Annotations from MS Smart Tags. Kynn Bartlett. WWW-Annotation Mailing List, World Wide Web Consortium. June 15, 2001.

Bechhofer, 2003. Tutorial on OWL. Sean Bechhofer, Ian Horrocks and Peter F. Patel-Schneider. 2nd International Semantic Web Conference, October 20, 2003.

Bennett and Metros, 2001. The Promise and Pitfalls of Learning Objects: Current Status of Digital Repositories. Kathy Bennett and Susan Metros. EDUCAUSE, October 21, 2001.

Berners-Lee, 1999. The Semantic Toolbox: Building Semantics on top of XML-RDF. Tim Berners-Lee. World Wide Web Consortium, June 18, 1999.

Bray, 2003. On Resources. Tim Bray. Ongoing, July 24, 2003.

Britannica, 2003. Ockham's Razor. Encyclop?dia Britannica. 2003. Encyclop?dia Britannica Premium Service. November 23, 2003

Burton, 2003. RSS Is Not The Solution To Spam. kevin A. Burton. PeerFear.Org, September 2, 2003.

CreativeCommons, 2003. Creative Commons. Website, 2003.

Crossref, 2003. CrossRef. Website.

Crossref, 2003a. doi info & guidelines. Crossref, 2003.

Doctorow, 2001. Metacrap: Putting the torch to seven straw-men of the meta-utopia, Version 1.3. Cory Doctorow. August 26, 2001.

DOI, 2003. The Digital Object Identifier System.

Downes, 2001. Learning Objects: Resources For Distance Education Worldwide. Stephen Downes. International Review of Research in Open and Distance Learning: 2, 1, 2001.

Downes, 2002. Paying for Learning Objects in a Distributed Repository Model. Stephen Downes.

Downes, 2003. RSS-LOM. Stephen Downes.

Downes, 2003a. Edu_RSS Topics. Stephen Downes. 2003.

Downes, 2003b. Design, Standards and Reusability. Stephen Downes. July 31, 2003.

Downes, 2003c. Edu_RSS Ratings. Stephen Downes. 2003.

Downes, 2003d. Meaning, Use and Metadata. Stephen Downes. August 25, 2003.

Downes, 2003e. Design and Reusability of Learning Objects in an Academic Context: A New Economy of Education?. Stephen Downes. USDLA Journal, Volume 17, Number 1, January, 2003.

Editeur, 2003. Website.

Eysenback, 2001. A metadata vocabulary for self- and third-party labeling of health web-sites: Health Information Disclosure, Description and Evaluation Language (HIDDEL). G. Eysenbach, C. K?hler, G. Yihune, K. Lampe, P. Cross and D. Brickley. AIMA, 2001.

Fitzherbert, 2000. Country Pasture/Forage Resource Profiles. Anthony R. Fitzherbert. Food and Agriculture Organization of the United Nations, 2000. AGRICULT/AGP/AGPC/doc/Counprof/kyrgi.htm

FOAF, 2003. The Friend of a Friend (FOAF) project. Website.

Franklin, 1994. Perl Regular Expression Tutorial. Carl Franklin and Gary Wisniewski.

Friesen, 2003. CanCore Guidelines Version 1.9: Classification Category. Norm Friesen, Susan Fisher, Anthony Roberts, Susan Hesemeier and Scott Habkirk. The Canadian Core Learning Object Metadata Guidelines. CanCore, 2003.

Friesen, 2003a. CanCore Guidelines Version 1.9: Meta-Metadata Category. Norm Friesen, Susan Fisher, Anthony Roberts, Susan Hesemeier and Scott Habkirk. The Canadian Core Learning Object Metadata Guidelines. CanCore, 2003.

Friesen, 2003b. CanCore Guidelines Version 1.9: General Category. Norm Friesen, Susan Fisher, Anthony Roberts, Susan Hesemeier and Scott Habkirk. The Canadian Core Learning Object Metadata Guidelines. CanCore, 2003.

Friesen, 2003c. CanCore Guidelines Version 1.9: Life-Cycle Category. Norm Friesen, Susan Fisher, Anthony Roberts, Susan Hesemeier and Scott Habkirk. The Canadian Core Learning Object Metadata Guidelines. CanCore, 2003.

Friesen, 2003d. CanCore Guidelines Version 1.9: Technical Category. Norm Friesen, Susan Fisher, Anthony Roberts, Susan Hesemeier and Scott Habkirk. The Canadian Core Learning Object Metadata Guidelines. CanCore, 2003.

Friesen, 2003e. CanCore Guidelines Version 1.9: Annotation Category. Norm Friesen, Susan Fisher, Anthony Roberts, Susan Hesemeier and Scott Habkirk. The Canadian Core Learning Object Metadata Guidelines. CanCore, 2003.

Friesen, 2003f. CanCore Guidelines Version 1.9: Educational Category. Norm Friesen, Susan Fisher, Anthony Roberts, Susan Hesemeier and Scott Habkirk. The Canadian Core Learning Object Metadata Guidelines. CanCore, 2003.

Friesen and Anderson, 2003. vPreliminary LOM Survey. Norm Friesen and Terry Anderson. Academic ADL Co-Lab Learning Repository Summit, October 8, 2003.

Gilfillan, 2000. Database Normalization. Ian Gilfillan. Database Journal, March 22, 2000.

Gillmor, 2003. RSS Hitting Critical Mass. Dan Gillmor. SiliconValley.Com, August 17, 2003.

Glennerster, 2002. Computational theories of vision. Andrew Glennerster. Current Biology, 12, R682-685, 2002.

Goodman, 2002. An End to Metatags (Enough Already, Part 1). Andrew Goodman. Traffik, September 2, 2002.

Google, 2003. Our Search: Google Technology. Google, 2003.

Google, 2003a. Google News Alerts (BETA). Google, 2003.

Handschuh, 2001. CREAM ? Creating relational metadata with a component-based, ontology-driven annotation framework. Siegfried Handschuh, Steffen Staab, and Alexander Maedche. Semantic Web Working Symposium, July 30, 2003.

Hot or Not, 2003. Hot or Not. Website, 2003.

Hunter and Armstrong, 1999. A Comparison of Schemas for Video Metadata Representation. Proceedings of the Eighth International World Wide Web Conference (WWW8), May 11-14, 1999.

IANA, 2002. MIME Media Types. Internet Assigned Numbers Authority, January 2, 2002.

Iannella, 2003. The Open Digital Rights Language Initiative. Renato Iannella. Website, 2003.

Iannella, 2002. COLIS ODRL Metadata Profile. Renato Iannella. COLIS. July 4, 2002.

IEEE, 2002. Position Statement on 1484.12.1-2002 Learning Object Metadata (LOM) Standard Maintenance/Revision. December, 2002.

IEEE, 2003. Learning Object Metadata (LOM) Final Draft. 2003. Was at but has now been stolen from the commons.

IFLA, 2003. Related efforts - Working Group on FRBR (Functional Requirements for Bibliographic Records) - Section on Cataloguing. International Federation of Library Associations and Institutions, 2003.

IMS, 2002. IMS Simple Sequencing Best Practice and Implementation Guide. October 17, 2002.

IMS, 2003. IMS Digital Repositories Specification. October 21, 2003.

ISBN, 2003. ISBN.Org. Website.

ISSN, 2003. ISSN Home page: Navigate the ocean of periodicals with the ISSN Website.

Jotajota, 2003. RSS Spam. Jotajota. rnd(Thoughts). September 9, 2003.

Karieauskas, 2002. Text Categorization Using Hierarchical Bayesian Network Classifiers. Guytis Karieauskas. 2002.

Lagoze, 2003. Metadata Challenges for Libraries. Carl Lagoze. Preprints of the Metadiversity Conference Proceedings, 2003.

Lananas, 2002. Alternative Architectural Concept 2 - Federated Integration. Stephen Lahanas. CETIS, November 12, 2002.

Lassila, 1997. RDF Metadata and Agent Architectures. Ora Lassila. November 21, 1997.

Leroy, 2002. Resource profiles utility. Patrick Leroy. Mainframe Week, January 30, 2002. journals/articles/0004/Resource+profiles+utility

Levine, 2002. Syndicating Learning Objects with RSS and Trackback. Alan levine, Brian Lamb and D'Arcy Norman. MERLOT, August 8, 2003.

Levitt, 2000. Cocoon: Sanity For Web-Site Management. Jason Levitt. Information Week, May 22, 2000.

Loder, 1996. Neural Networks: An Overview. Chad Loder. February 28, 1996.

Magee and Friesen, 2001. CAREO Overview and Goals. Michael Magee and Norm Friesen. 2000, revised 2001. Campus Alberta Repository of Educational Objects.

Madison, 1997. Functional Requirements for Bibliographic Records: Final Report. Olivia Madison, International Federation of Library Associations and Institution, 1997.

McGee, 2003. Learning objects: Bloom?s taxonomy and deeper learning principles. Patricia McGee. AACE E-Learn, November 18, 2003.

MERLOT, 2003. Peer Review of The Fugues of the Well-Tempered Clavier. MERLOT Music Review Panel. MERLOT, July 10, 2003.

Michigan, 2002. Mammography Machine Operator Performance Evaluation. Michigan Department of Consumer and Industry Services, January 10, 2002. http:// cis_bhs_fhs_bhs_hfs_889_37201_7.pdf

Miller, 2001. RDF Calendar taskforce. Libby Miller. Institute of Learning and Research Technology, Bristol University, April 10, 2001.

Miller, 2003. RDF Annotations. Libby Miller. Institute of Learning and Research Technology, Bristol University, April 4, 2003.

Monthienvi, 2001. Educational Metadata: Teacher's Friend or Foe? Rachada Monthienvichienchai, Angela Sasse and Richard Wheeldon. Euro-CSCL, January 17, 2001.

Naraine, 2003. Is RSS the Answer to the Spam Crisis? Ryan Naraine. InternetNews.Com, September 1, 2003.

NEC, 2003. CiteSeer. Website. 2003.

Nesbit, 2002. A Convergent Participation Model for Evaluation of Learning Objects. John Nesbit, Karen Belfer and John Vargo. Canadian Journal of Learning and Technology Volume 28(3) Fall / automne, 2002.

Netscape, 1996. Inline Plug-Ins. Netscape Communications Corporation, 1996. Mirror,

Nilsson, 2003. RDF binding of LOM metadata. Mikael Nilsson. Centre for User Oriented IT Design, january 15, 2003.

NISO, 2000. ANSI/NISO Z39.84 -2000 Syntax for the Digital Object Identifier. National Information Standards Organization.

NISO, 2002. Data Dictionary?Technical Metadata for Digital Still Images. National Information Standards Organization and AIIM International, June 1, 2002.

Norman, 2003. IMS LOM, Thumbnails, and Relations. D'Arcy Norman. D'Arcy Norman's Learning Commons Weblog, November 12, 2003.

Norman, 2003a. CanCore Metadata Guidelines Updated. D'Arcy Norman. D'Arcy Norman's Learning Commons Weblog, September 11, 2003.

OASIS, 2002. DIG35: Metadata Standard for Digital Images. OASIS Cover Pages, June 10, 2002.

OASIS, 2003. Digital Object Identifier (DOI) System. OASIS Cover Pages, March 15, 2003.

Oliver, 2003. FRBR Functional Requirements for Bibliographic Records: What is FRBR and why is it important? Chris Oliver. Canadian Metadata Forum, September 19, 2003.

Paskin, 2003. DOI Handbook, version 3.3: Glossary. Norman Paskin. International DOI Foundation, November, 2003.

PRISM, 2003. Publishing Requirements for Industry Standard Metadata. Website.

PURL, 2003. Persistent Uniform Resource Locator.

Recker and Wiley, 2001. A non-authoritative educational metadata ontology for filtering and recommending learning objects. Recker, M.M. and Wiley, D.A. Journal of Interactive Learning Environments, Swets and Zeitlinger, The Netherlands, 2001. Referenced in

Rightscom, 2003. The MPEG-21 Rights Expression Language: A White Paper. Rightscom, July 14, 2003.

Ruiz, 2003. Hierarchical Text Categorization Using Neural Networks. Miguel E. Euix and Padmini Srinivasn. Information Retrieval, 5, 87?118, 2002.

Ryan, 1998. Costner's "Postman" Stamped. Joal Ryan, E! Online, March 23, 1998.,1,2726,00.html

SAS, 2003. Diagrams for Relational Metadata Types. SAS 9 Open Metadata API Reference, 2003.

Schulmeister, 2001. Taxonomy of Multimedia Component Interactivity A Contribution to the Current Metadata Debate. Rolf Schulmeister. Studies in Communication Sciences. Studi di scienze della communicazione. Special Issue (2003) - S. 61-80.

Senior Citizen's Guide, 2003. Resource Profiles. Senior Citizen's Guide, retrieved 2003.

Shirkey, 2003. Otlet: Some ideas die because they are wrong. Clay Shirkey. Corante: Many-tp-Many, November 20, 2003. otlet_some_ideas_die_because_they_are_wrong.php

Smith, 2003. Well-Tempered Clavier: Johann Sebastian Bach: Twenty-Seven Fugues and Select Preludes. Tim Smith and David Korevaar. Northern Arizona University.

Sullivan, 2002. Death Of A Meta Tag. Danny Sullivan. Search Engine Watch, October 1, 2002.

Sun, 2003. What Is Inheritance? Sun Microsystems. The Java Tutorial, 2003.

Stufflebeam, 1971. The relevance of the CIPP evaluation model for educational accountability. Stufflebeam, D. L. Journal of Research and Development in Education. 5(1), 19-25., 1971. Cited in

Sutton, 1999. IEEE 1484 LOM mappings to Dublin Core: Learning Object Metadata: Draft Document v3.6. Stuart A. Sutton. IEEE Learning Technology Standards Committee (LTSC), September 5, 1999.

Swartz, 2000. RDF Site Summary (RSS) 1.0. Aaron Swartz.

Technorati, 2000. Technorati. Website, 2003.

Tillett, 2002. The FRBR Model (Functional Requirements for Bibliographic Records). Barbara B. Tillett. Workshop on Authority Control among Chinese, Korean and Japanese Languages (CJK Authority 3), March, 2002.

Trott, 2003. Comment Spam. Ben Trott. Six Log, October 13, 2003.

Udell, 2003. Working with Bayesian Categorizers. Jon Udell. XML.Com, November 19, 2003.

VMC, 2002. Multimedia Metadata Standards. Virtual Museum Canada. Canadian Heritage, April 27, 2002.

W3C, 1999. Resource Description Framework (RDF) Model and Syntax Specification. World Wide Web Consortium, February 22, 1999.

W3C, 2001. Web Services Description Language (WSDL) 1.1 W3C Note 15 March 2001. World Wide Web Consortium.

W3C, 2002. Open Digital Rights Language (ODRL) Version 1.1. W3C Note 19 September 2002. World Wide Web Consortium.

W3C, 2003. RDF Vocabulary Description Language 1.0: RDF Schema. W3C Working Draft 10 October 2003. World Wide Web Consortium.

W3C, 2003a. Cascading Style Sheets home page. November. World Wide Web Consortium.

WCHS, 1998. Siskel & Ebert. WCHS-TV News 8, 1998 (and not updated in five years).

Wikipedia, 2003. Reification.

Williams, 2000. Evaluation of learning objects and instruction using learning objects. David D. Williams. The Instructional Use of Learning Objects, David A. Wiley, ed.

Stephen Downes Stephen Downes, Casselman, Canada

Copyright 2024
Last Updated: Jun 14, 2024 06:16 a.m.

Canadian Flag Creative Commons License.