Sept 20, 2003
"Metadata is one of those words which are fashionable, frequently used, and little understood."
We just concluded a symposium on the preservation of electronic records. There was phenomenal interest. The public came on the Sunday, and this place was jammed (they want to know, will the love letters sent on the internet last until the wedding). And the public would probably be interested in this issue as well, if they knew what it was about.
Our key issue, for the library, is how to take this mass of information we hold, how do we make it accessible, how do we open it, how do we ensure increased access in a coherent way, and how do we link it to the National Gallery, the other government agencies, the municipal governments. It relies on metadata at the very heart. The National Library has played a key role in this, developing vocabularies, the GoC metadata implementation implementation guide, the controlled vocabulary register, and more. Metadata is the key to the knowledge. It opens our industry.
Access is heavily dependent on conmtrolled and consistent vocabularies, it is the means by which we can provide consistsent services online, and it is a priority at the highest level. The focus is on Canadians - what do they need? How do we come together to meet their needs, to meet their expectations?
UniversitÃ© de MontrÃ©al
Link - (requires an SVG viewer, Acrobat 5 has this)
The metadata map project began in 2002 as a supplementary project. It is designed like a subway may; you can move about the map like a train. The names of the lines are based on major topic areas. Each line has a route map - eg., the Dissemenation route has stops such as LDAP, etc. Each stop is a particular metadata format; click on the station and you obtain an explanation of the format. The pages, in turn, link to the home page or important resources for that format. There ius also a version with an index.
Why not an alphabetical list? Because there are almost 200 metadata initiatives, grouping by themes was useful. Using a map also allows us to show the relationships between the formats. And the index gives us an alphabetical list anyways, so we don't lose anything. The key here is the efficient way information is presented, the way it is on the London Underground map. There is a convention to these maps - compare, eg., with the Berlin map, the Paris map.
The map was oirganized first by mapping formation processes, creation, organization, preservation and diffusion. This was mapped against institutions with exp[ertise in information. Third, types of digital files were recorded. Though it was originally planned to settle on one of these, the map uses all three. The organization was accompanied with a search for the information.
It became a classification exercise: grouping by theme with no guidelines, followed by grouping by theme with guidelines. As with any classification, our groupings are arbitrary. The lines were identified, then attempts were made to connect them. It was, of course, an iterative process.
The map is drawn in SVG. SVG is an XML graphics format endorsed by the W3C. It is like Flash, but public. It has "really taken off now." It was used for the metamap because the map is intended to be a public tool, and SVG is a public tool in the same spirit. It works in any computing environment. It can be edited as a text file. And it supports links to externatl resources.
The scope of the map includes metadata standards, initiatives, sets (MSSIs) related to information science. The goal was to gather the useful MSSIs in one place. Information that is not directly related, or not useful, was excluded. That doesn't mean the scope is tightly defined; it is vague to be kept flexible. Specific subject-area metadata was not included, though the granularity may be reconsidered in the future. The 'yellow line' represents major institutions, such as W3C or IEEE, that are not MSSIs but are nonetheless important.
Plans for the future: first, include versions in other languages (Spanish and Portugese versions are underway, offers have been made for Chinese, German and Turkish). Then created increased granularity 'local neighbourhood' maps. When you leave the station and are 'in the road' you can see a local area map. It might look like a neighbourhood with stores, banks, offices, etc. But what we'd really like is a large XML file with all this information, which would generate the map automatically. But we need to know whether this is even theoretically possible.
McGill University Library / Canadian Cataloguing Committee
Functional Requirements for Bibliographic Records
FRBR: the report had two purposes: to define a clearly defined framework for relating data. The FRBR is defined from a user-centric approach, based on an analysis of user tasks. Which part of the records are used to perform which tasks?
What are the user needs:
Others? Add, manage, navigate, attribute royalties to, preserve... But it's the first four of these that are in the report.
The model itself is simple:
- Group 1 - Products of intellectual or artistic endeavours (per DRBR)
Work is realized through expression is embodied in manifestation as realized by manifestation
- Work - no single material object one can 'point to' as the work; it is the commonality of content between the various expressions of the work
- Expression - the intellectual or artistic realization of the work - eg. alphanumeric notation, musical notation, etc. Eg., French language expression. Or., eg. the notation or code used. Or., eg. the sounds that correspond to the words.
- Manifestation - the physical embodiment of the expression
- Item - the exemplar of the manifestation
- Group 2 - Those responsible for creating, producing, etc.
- Corporate bodies
- Group 3 - subjects of works
- Object etc
- Group 1 - Products of intellectual or artistic endeavours (per DRBR)
- Attributes - each entity has a set of characteristics or attributes. For example, a work has a title, form or genre.
- Corporate body
- Concept ... etc
- Relationships - these are links between the entities - eg., collocation (a grouping together) - relation is what assists the user to navigate through the data.
- Between entities, same work - eg., person/work (created by); person/expression (performed by, translated by) - eg., Hamlet - original text, translations, versions ...
- Between entities, different works - eg successor, supplement, complement, summarization, adaptation, transformation, imitation
- Same work: abridgement, revision, translation, arrangement
Why is this model useful? It maps from the user tasks to the attributes and the relations. What does the user user to search and obtain the item? It helps us undertsand what is valuable in the bibliographic record. It looks at records from within the point of view of a large database. And it broadens the focus from manifestations: we are also seeing the work and expression.
The model improves navigation for the user - it reintroduces logical indexing versus purely mechanical filing. And it improves the display of information (especially online) to the user. When the computer came along, you couldn't really 'shape' the catalogue (the way you could a physical catalogue) through, eg., grouping - this reintroduces that. It also improves services. Eg., if you can put a 'hold' at the expression level, you can get the book you want no matter which particular manifestion it is.
The model also puts a new spin on old problems. Consider, eg., the content versus carrier problem. Where is 'carrier' in the hierarchy? At the manifestation level. Or consider, eg., the separation between abstract and physical entities.
- VTLS Virtua
- Variations 2
- OCLC (uses inheritance - the metadata of the work, eg., is inheritited by the expressions)
- ISBD community
- AACR community
|Key question: does vocabulary create ontology? Because, for example, an expression exists, does a work exist? Consider the debates between Ockham and Scotus - are the 'essences' of things real?|
Halton Hills Public Library
Public Library Applications Traditional library services - inlcude collection catelogues, indexing services, etc.
Challenge: how do we bring things together at that level.
But also, some libraries in our district (Holton) have directories of agencies and services. Think of the concept of '211' as an alternative to '411' and '911' which would provide access to these agencies. One thing we would want to do, for example, is match volunteers to opportunities. We are looking for data regarding "what's happening in our community". There is a lot of commonality between what we do and what they do.
Third, we we also have local historical and geneological information. Thus. we have newspaper archives, image archives, books, census reports, wills, property records, military records, cemetary information, business directories, maps....
These types of information have different characteristics appropriate to the medium, and there are challenges when you try to build across these things...
Testbed: Maritime History of the Great Lakes. This is composed of entirely volunteer, contributed content; non of this was paid for, it was all given to the site. We have "people out there somewhere creating things that we want to deliver and package." We have something like 15,000 articles in the newspaper database: we transcribe or image the text. How do we take the donated material and turn it into something that has meaning?
Metadata schemas are a way to distribute or share data, not a way to store it. Metadata schemas are largely about how you encode content; good metadata schemas allow you to support multiple metadata schemas. For example, you wouldn't standardize on Dublin Core - you want more complexity at the lower end, then you can support multiple formats.
Images Database needs:
- Export routine to transform the data to Images Canada in Dublin Core
- Work with NCSA to support OAI output formt
- Crossnet's Zeblib to translate into MARC and SUTRS to answer Z39.50 queries
Community Information Database needs:
- CIOC's gateway uses Zedlib to deliver MARC 21
- Could be integrated to broadcast Z39.50
- Export to Microsoft Access
A general problem is that when we create binary objects, they generally don't know much about themselves. We need them to be self-descriptive at the resource description level. If you lose the contact between a TIFF and the record that describes that TIFF, you have issues.
From the public community, we have a strong demand for simplified access to a broad range of information, beginning with municipal information, but also at all levels. We have far too many silos with native interfaces that don't do a good job of distributing. We also want localized, customized interfaces to searched data. We need 'broadcast searching' - just-in-time searching, single place to look, which can be built across native databases.
The same data can support a wide range of metadata standards. But we need standards-based interfaces that allow broader searches.
|Key question: will there be one key metadata standard? Probably not - we have to learn to work across standards. This means multiple vocabularies. Unity and diversity.|
Faculty of Information Studies, University of Toronto
RAD (Rules for Archival Description) and EAD (Encoded Archival Description)
(Crossword image of metadata) Nature of archival material - by contrast to FRBR, it is not the output of intellectual action; it is evidence of an action or event. Eg., a birth certificate is not intellectual property, it is evidence that a passage took place. Archival material is typically a byproduct of a business or personal activity. I write a memo for a purpose, to create an action, and in an archive it has a secondary value. We talk about 'unconscious creation' - when we create a memo, we are not thinking of a history, we are trying to create an activity. And archives are aggregated in different ways than the library would think: documents from a trip, for example: a journal, a map, a ticket stub.
The purpose of archival description is to provide access to the material. It is also to create an understanding of the material, since all the information about the record isn't in the record itself. This is done by showing the relation between records, the scope and content, or information about the document. Eg., consider a letter of complaint. If it's in a file labeled 'crackpots' it takes on a whole different meaning. Finally, an archive must maintain a standard of trustworthiness and authenticity - we need to know where the record went and who had custody of it.
Archival description: begins with archival creator. Authorship is of secondary value; rather, the question is who collected it pr aggregated. I didn't author my ticket stub, but my having collected it is important. Remaining categories include aggregates, multi-level, and finally, context. Who created it, when, and why? What are the procedures that went into creating this document?
Archives have 'finding aids'. They have inventories that describe the fonds and parts, catelogue records, thematic guides, calendars, and file and item lists.
In Canada, the major standard is RADS (Rules for Archival Description). Internationally, standards include ISAD and ISAAR. And (maybe) a Canada-U.S. standard: CUSTARD - "but the CUSTARD didn't set very well... we will not have a North American standard."
RAD - describes from the general to the specific. The first level of description is the fond. It does not provide rules for collections. It uses the AACR structure, with variations. We do not focus on authorship; we focus on provenance.RAD provides rules for descibing all types of materials, including multimedia material.
For structure standards, MARC is used (by Americans archivists and some university archives in Canada). It is not widely used because it is not multi-level. There is also Encoded Archival Description (EAD), a multi-level XML standard ("created by archivists for archivists"). There is a new initiative, called Encoded Archival Content (EAC). "We have a different way of looking at things, and we need to come together where the library world is not the elephant."
Cross-Domain Metadata - the agencies that need to come together include the library community, the museum community and the archive community. IFLA items focus on manifestation and items. Archivists focus on context. Museums focus on...? What brings them together? The SPIRT metadata scheme is only one approach... it's a view of the world that archivists would understand.
"We have very different views of the world" "Categories are historically situated artifacts and like all artifacts are learned as part of membership in a community of practice." - Bowker and...
Users do not differentiate searches by professions.They want access regardless of institution type. To meet our user needs we must collaborate and develop compatible metadata schemes.
Comment: also the records management community, as a fourth community.
Comment: harmonization won't happen. Reality is more like a Venn diagram. We all see items - we need a way to create things at the item level. We need the 'ANDs' and the 'NOTs'. Response: not an either-or. But you talk about the item and go out - we start at the fond and go down.
|Key question: what is the common element? What is the foundation? Or: what is the thing that we name (as opposed to describing, or as opposed to composing out of named things). Items? Things? What about fonds? There isn't - and cannot be - a base level ontology here.|
Sarah Klotz and Lorraine Gadourt
Using Metada to Describe Archival Holdings
New techniques - new stages of archiving. For example, digital scanning or microfiche. For example, an artifact from 1666. Or various drawing types. Or a list of roles of various officials. It's a way to make these artifacts accessible. The archival record also consist of a description of the artifact, where it was found, etc. (Samnple of archival record displayed). These are placed in a database to assist in research and discovery. (Example of another format) This facilitates research by proving a simple form people can use to locate artifacts. Output may be a brief listing or a complete description.
These archives were first described at the turn of the century; there were no standards, but since there was a clear purpose there was some consistency. The current project is to describe these records. There are both French and English record, and these vary. We chose EAD because it doesn't impose a structure. It doesn't instruct what and how to describe.
We needed both standard and divergent elemets. The descriptions, the indexing, and the archive were created in different centuries. A more stringent standard would have made it impossible to represent materal. Also, if we had had to redescribe materials, it ould have added enormous costs. "The key to interoperability ois not just the use of a common standard... it is flexibility... the ability to uphold, explain, all the intricies that exist in descriptions."
|Key question: does contemporary archiving and indexing rewrite what has gone before?|
Diana Dale (Facilitator)
Objective of the panel: to foster an awareness of multimedia and its special metadata needs. Multimedia.. some combination of media.
What is MPEG - Motion Picture Experts Group? Its a bunch of people who work out standards for the delivery of video.
MPEG-1 - not a metadata standard - an audio-visual compression standard. Achieves 1.5 Mbits per second. It removes spatially and temporally redundant data. The audio compression is known as MP3, and took off when Napster took off. It was originally intended for CD-ROM and video CD.
MPEG-2 - Also not a metadata standard; it is an extension of MPEG-1 and supports interlacing of frames. Uses higher bandwidth - 2 to 8 Mbits per second. It is used for DVDs (there was supposed to be an MPEG-3 for DVDs, but MPEG-2 did everything that was needed).
MPEG-4 - Also not a metadatya standard, but is designed to deliver things over the internet. It supports different profiles, for example, the gaming profile, the animation profile, etc. The bandwidth range is 8 kbits per socond to 1 Mbit per second.
MPEG-7 - is a metadata standard - multimedia content description interface. It's very complex, not used widely, but the subject of a lot of experimentation. You can describe the audio characteristics of a piece of video, the motion characteristics, etc., including the semantic content. There is an organization called tv-anytime.org
MPEG-21 - also a metadata standard, which we are not allowed to talk about today. ;)
Metadata can be used to capture various sorts of information. The advantage of metadata is that it captures all of the information. Contrary to popular opinion, it preserves well the content and makes it accessible. Metadata allows for multimedia management, as well as searching. "Un megacomprehension du multimedia." Alex Eykelhof
"When you're dealing with video, metadata takes us past the black box." Would you buy a book that you couldn't view, that forced you to start at the beginning? Metadata allows us to get past that for multimedia. DVDs are getting us there, eg., you can bounce to a particular scene. This is especially important for learning - they want to right to where the information is, to use an index.
"Without quality metadata, it is impossible to access, describe..." Not only a map, but also a structure. The description is usually the first level of encounter; without the description it floats anonymously. Administratiove metadata translates into digital rights. DRM holds the key to locking or unlocking access, and this is a critical tool to ensure appropriate use.
This morning we talked about rules for description. There are many ways to create a description. We must develop a metadata card, and the method to fill out the card. Staged migration to minimize risk.
Take for example, a video collection. You have a metadata database. There exists a description for each object. There may be millions (300 million) of documents. Your video is encoded. We want to capture the data in a transparent and automatic method. We have content in obsolete formats; it is in danger, and we must do a migration. This gives us the opportunity to add metadata. We capture, eg., speech to text, which is then indexed and tagged with a time code. This lets us do, eg., closed caption, as well as indexing and search.
It didn't start out as a metadata project. We started out to solve a practical problem - in general, don't use metadata just because you can, use it to solve a problem. The problem: borrowing the (one and only) copy of a multimedia asset - solving this problem let to the video on demand project, which led to the need to access only parts of a video. We did a pilot project, using MPEG-7 and CanCore to mark up the video.
We create a wide range of projects, which means that the selection of metadata is critical to us. We want to adopt a metadata framework that best reflects the library's digital content. Text, video, audio, flash, 3D, etc. We want to be sure the archival resources is available in the future. We are looking at METS. Key features of METS include: descriptive metadata for discovery - but this is optional. METS does not require a particular scheme for description. The same with administrative metadata. The metadata can either be wrapped, or pointed to. Files can be grouped, pointed to or contained. The map outlines a hierarchal structure describing the complex object.
We can be fully automated to created metatags, we can automatically retrieve video segments. We are now in a position to create a system for content sharing. Looking at a maturation of the technologies. "Du planetaire" - global sharing of data. Planificacation and normalization of metadata are very important.
We are now in the analysis phase, gathering statistal data. Looking refining use of MPEG-7 and CanCore, and will spend time on rights management. We also need money. Most of the work is being done manually. We also need a rigths clearing house.
We will be able to capture metadata more easily, and also conduct metadata recon projects. But what is cool today is cold tomorrow; "Rust never sleeps." The means to ensuring access in the future is well-documented metadata today. Also, looking at the growth of institutional repositories. We need professionals to lead the way with standards.
CHIN - Canadian Heritage Information Network
Museum Metadata and CHIN
Information gets to us now through a mapping process. There are three databases accessible through the CHIN site: humanities, natural sciences, and archeological. There are also three data dictionaries created over the years with the museum community. There are 613 fields in all in this database.
www.chin.gc.ca/English/Artefacts_Canada/index.html - mapping of fields ('Crosswalk').
Also did a mapping in natural history to a standard called Darwin Core.
Some museum standards:
- Categories for the description of works of Arts - 275 categories and subcategories. A subset documents visual resources. There is also a document for cataloguing cultural objects. Object ID, also from Getty, is an international standard to collect the information needed to identify cultural object - important recently because of stolen objects. Fields include inscriptions, markings and distinguishing features.
- Museum information standards.
Metadata: semantics: we started with Dublin Core, then for education we used GEM (gateway to Educational Materials). Also used Schoolnet descriptors, because GEM subject headings were not useful in a Canadian context.
Metadata pragmatics: we wanted to automate metadata but it was not that helpful. So we created a toolkit consisting of a harvester and a cataloguer. The harvester validates the URL, then begins the retrieving / cataloguing process. We were going to validate URLs but it hasn't happened yet. The cataloguing tool was designed to be simple, easy to use, and short, since we are not using professional cataloguers.
Elements of the cataloguing tool:
- change management
- quality assurance
Resources show up on the Virtual Museums website. There is also a teachers' centre on the site. You can search, browse, etc.
Our newest product is 'Community Memories'. People create history exhibits, then fill out a form on the site. They fill out the five tags required by the govenrment of Canada and some additional tags.
issues: search, syndication...
Comment: as a reseracher, I might want to know a lot about an object that isn't just in Dublin Core. You want to be able to produce multiple output formats.
A learning object is "any digital resource that can be reused for the purpose of teaching and learning." Reused is the key word here - it is much more productive to create small pieces of content and share them. To have content not reused is just not viable economically. They therefore need to be interoperable and portable.
A learning object repository is a collection of digital or metadata assets without prior knowledge of the repository structure.
CanCore is based on and fully compliant with the first e-learning standard, IEEE 1484.21.1, LOM 1.0, metadata. It is a multipart standard intended to facilitate search, acquisition and use of metadata - resources need to be discoverable, described, annotated by instructors.
CanCore is an application profile, a "customization of a standard to meet the needs of particular communities with common application requirements." Because IEEE-LOM was developed by engineers, a lot of the terms are odd to educators, so CanCore makes these meanings clear.
CanCore tried to bridge the gap between the "structuralist" approach of IEEE-LOM and the "minimalist" approach of Dublin Core.
In e-learning, it has been largely technology driven, and the specifications have focussed on syntax; CanCore has tried to focus on the meanings of the tems. A similar project has been undertaken with Dublin Core. It's an attempt to sharpen the focus of IEEE-LOM and to make the job of implementors easier.
These guidelines provide clarifications of each element, best practice recommendations, examples of use, and more.
Athabasca has created a simple metadata creation tool, similar to CHIN's. AdLib uses the simplest and most important LOM elements. http://adlib.athabascau.ca
Another project is the MARC -> LOM translator (or Crosswalk). http://marc-lom.athabascau.ca/marc/index.html
Learning object repositories:
- 1972 - TELUQ
- 1984 - Canal Savoir
- 1992 - LICEF
- 1999 - Cogigraph
- 2000 - CIRTA
eduSource - network of learning object repositories. Also, LORnet, a research network.
Standards are used everywhere in the e-learning system. SCORM, for example, ensures that a learning package can be used by an e-learning system even though it was produced by another one. Learning Design is a standard way of describing course structure.
Back to eduSource - it brings together some groups that were involved in e-learning standards. We are creating a suite of tools, also called 'Repository in a Box'.
- It is directed toward the 'third wave' programmable (or semantic) web.LO metadata is the tip of the iceberg: it gives learning objects semantics; if you don't have metadata you can only search aorund syntax.
- Once you do that, you can help in knowledge management - KM has to be more than just document management; you need to work with the knowledge representation to get access to the knowledge that you need.
- It also supports object-based learning envirnments.
- Interoperability standards.
eduSource is an open source infrastructure that allows owners of different types of metadata to contribute to the educational network.
eduSource is being built - but we can see an e-learning system in operation with Explor@. We can use the system to find an object; we select it and place it in a sub-repository. Then we reference objects by describe them using the CanCore terminology. We use the system to searchfor them, then using Explor@ we can aggregate them. These aggregations can become new objects that are put back into the repository.
There are of course technical challenges. Consider software development: it started with separate tools, then the integrated suite, then finally interoperability at the desktop level. We are witnessing the same thing in e-learning, and that's why metadata is so important. In e-learning, troday's LMS or LCMS is the functional equivalent of the integrated suite. But we need to move forward toward web services interoperability.
But the most important challenges are pedagogical and cultural challenges. For example, learning objects are economical and flexible, but they force pedagogical rethinking: you need to create activities, choose pedagogies, etc. It also shows that we have to attach more importance to communication and learning activities than to media or document selection. This sort of system also allows us to separate content objects from use scenarios and learning environments.
The cultural obstacles are even more difficult. We have issues of lecturing vs facilitation. Many university teachers lack pedagogical training. Professors often teach the way they were taught, even when online. There is also the issue of IP versus open source, and a need for foster collaboration between content creator and the instution.
Comment: question of governance issues - who looks after it, quality of objects, who is going to use it? Response: we have to have different levels of learning objects. If you want to guarantee the quality of learning objects, then you need librarians.
Content Delivery and Rights Management
Talking about cross-domain interoperability, not just the content discussed at this forum, but also the content produced by the content industries. But there will be parallels that can be drawn.
My project has to do with PC46-SC9 of ISO, identification schemes. The context of this project: new business models are emerging, driven by technology supporting the seamless integration of media and the delivery of content through open networks. This is driving these industries to reposition themselves: there are new players, and old players are changing. Silo models are breaking down.
To make this transition successful it is necessary to create technology to support cross-domain applications. These are emerging at the technical level. At the semantic level we are also seeing some technology put into place, for example, MPEG 21. There is also the Index Project, the ABC Harmony project, and others.
The agencies in this study see a new role for standard identifiers. Historically, identifiers, such as ISBN, grew out of one industry. But now there is a recognition that they have a function across domains, and they need to be refined and extended. And the ideantifier won't do the job; there is a need to supplement the identifier with descriptive metadata.
The objective of the study was to develop a shared frame of reference for describing business and information transactions.
Business architecture: That means a common understanding of the nature of those transactions, and placing that into a structured form. This involves identifying the functions performed by individuals and agencies. We want to highlight the key business relationships identified by those functions. These break down into content delivery and rights management.
Functions: originate, own, consume, produce, distribute, administer, register, monitor, certify...
Information architecture: we need to undertsand what entities we're dealing with. Three major gorups: production cycle, distribution cycle, rights management.
In the production cycle, the key object we're dealing with is the product - this is the thing that ends up on our shelves and the gallery wayys. Key events are production and release. The key agent is the producer. This becomes more complex when you realize that the product is the content embodied within a physical object. So in addition to the producer, we have the creator of the content.
From the rights management perspective, we bring in the concept of the property that is embodied in that object. Consider a CD. The content in it is the tracks of music. In the content there may be several layers: the work, which belongs to the composer or the lyricist, then the performance, which belongs to the performer, and the production, which belongs to the producer. Because we have property incolved, we must invoke the idea of authorization.
In the distribution cycle, we have the product, distribution, distributor. From the rights and authorization perspective, users come in to the picture. We get resource access, resource use, etc.
Interoperability may be viewed from several perspectives. From the functional perspective, we see product, content and property. Granularity - what is distributed? Identity - what constitutes a change?
|Key question: Who performs the functions? Do libraries authenticate? Clear rights? Monitor? What functions are appropriate in a public space? An information space?|
Treasury Board of Canada, Secretariat
Government of Canada Metadata Framework
Various frameworks in the government of Canada: management accountability, management of information, enhanced management. Frameworks evolved as follows:
- 1970s - Library cataloging - MARC format
- 1980s - Rules for Archival Description (RAD)
- 1990s - FDGC - geospatial metadata
We wanted a common approach. The Government of Canada metadata framework is based around the idea of CLF (common look and feel) across webites. This is not just visual: it has to do with navigation and finding items. The basic metadata employed is Dublin Core (DC). From there we get a 'Venn Diagram' with DC at its core, with additional metadata elements to support different functions, for example: records management, portal management, domain-specific data.
Environment Canada has a framework containing thre levels of metadata: discovery, access and use. In other words, they adopt a flexible strategy to meet each of these three needs. Public access (discovery) can be supported with Dublin Core; specialized access is supported with the CSGDM specification. Use - such as data sharing - used much more extensive profiles.
DC contains 16 elements (including elements). Five are manditory on government websites.
Health Canada - three categories of metadata standards: descriptive (eg., DC), lifecycle standards, for business processes, records management, etc., and administrative standards.
Standards such as DC give us a set of metadata elements. But over and above that, we must consider metadata values, such as controlled vocabularies and encoding schemes - will there be a standard set of values? Eg., for date, we have adopted ISO date format. These decisions can be encoded, and this encoding creates an application profile. The application profile, in turn, is recorded in a registry, which supports computer-to-computer undertsanding of metadata.
What we have so far is a semantic model: we have metadata elements, values, and we have adopted international standards. These can be added to locally. But really, the more commonality, the more we can agree, the better chance for interoperability. DC has established some of thesde for us (eg., element encoding schemes, DCMI type vocabulary, ISO 639-2).
We adapted DC because we need certain types of information. For example, we say DC:Creator must identify a government department. But we also need to accomodate diversity. The needs are different and achieving commonity is difficult. Controlled vocabularies must be registered and publicly accessible; these are managed by the National Library.
Principles are evolving for the development of broad, high-level schemes. Four major principles:
- Applicable - terms must represent content found on a significant number of government websites
- Recognizable - so they can be used by non-experts
- Client-centric - concepts and terminology must be tested with the public
Government of Canada schemes (on Treasury Board webiste, under 'information management')
- Titles of fed organizations, GEDS
- dc: subject
- GOC core subject thesarus
- dc: coverage
- Cdn geographic names datadabase
- Regions of Canafda
- dc: audience
- GOC audience scheme
Looking to the future - we can look at the element sets as facets that can work together to facilitate information access, for example, to support searches. But more powerful is the idea that the different metadata elements work together to support different kinds of access. Metadata acets include:
- stage of a business in lifecycle or size
We want to move beyond information and into services and transactions - we want to use metadata to describe services. But we need to be careful developing schemes, and we need to pay attention to the quality of the metadata we create (though there will always be a human dimension to metadata creation - content creatorts must be given the tools).
Comment: when I search, I search across the whole web, not just the GOC website. Reply: our work is focussed on improving serach in the GOC website. We don't se yet a metadata search on the website because finding a way to present a metadata search is not easy.
Comment: can we connect government business outcomes to metadata? Reply: we have been given that task. We need some overall big picture, where does metadata fit in? There has to be a direct link to show how the metadata supports the business of the government.
Comment: Why is Treasury Branch in charge of metadata? Response: we try to work collaboratively. Within the government, the Treasury Board has the mandate to set policy standards, information management standards. It's also about public access.
Submission form and new website being launched next Saturday for distribution of GOC news.
News was traditionally delivered through traditional media outlets, or wire services, press releases, etc. We have more than 20 different departmnents, each with a new site, creating many different locations for news. The news site was created to provide a single point of access. "Anyone with internet access gets the same information, unfiltered, at the same time as everyone else."
Phase I was launched in May, 2002. It is maintained by Canada Newswire. It extended content to include speeches, etc. It is now being adapted to use metadata. The current site does not yet support metadata, nor is it accessible. Today there is an awkward system for distributing news.
Phase 2 inprovements: better query results by using DC; will also use controlled vocabularies, including audience and coverage, for targeted information. It also takes advantage of the worldwide shift from search content to searching metadata. This introduces context to resource discovery. It also helps delivery through RSS and wireless formats.
Phase 2 uses metadata to populate the website. Eg., five major news types, target audiences, regions. News items are defined using the controlled vocabularies. Seartch will support standards extensions (application profiles). Phase 2 thus provides streamlined delivery of government news (nice network diagram).
Old scenario: departments to newswire, who transform for delivery to public. New scenario: departments directly to public. Syndication technologies are used as an alternative to email subscription. (Explanation of RSS) There will be 35 RSS feeds.
|Key question: news.gc.ca distributes using XML syndication technologies. But why doesn't it receive in this format? Why does it not aggregate from individual department RSS or similar feeds?|
Records Management Metadata for the Government of Canada
Records management metadata is metadata that ensures the management of current records. It assists in the access of resords, links records to business activities, links records to those responsible, and helps ensure authentic, trustworthy and reliable records.
- Departmental identifier - shows the provenance of the record
- Organization - indicates accountability
- Document number - system-generated unique identifier
- Author - who created the
- Trustee - who is responsible for the care of the record
- Signed by - the person with the signing authority, for authenticity
- Designated Recipient - the to, cc fields
- Subject name
- Subject Code
- Date - may include date of creation, other lifecycle dates
- Essential status
- Access rights - default is 'read access by all'
- Security - security designation
- Location - physical or electronic
- Final - maps to final version of the record
- History - actio ns performed on the record, edited, read, etc
- Preservation and Migration Period
- Retention Period - how long it is retained before authroized disposition
- Retention Trigger - Starts the retention period clock
- Disposition action - destroyed, alienated or transferred
- Disposition date
- Appendix A: metadata concordance table
Comment: how to map transactions into the subject field? Reply: use of thesaurus, classification. But we have to look at this.
Comment: classification codes - end users do not generate good classification codes. Reply: we are having a lot of discussion about this. It's typically a shelf locator. But we are looking at numbering scheme that connects with functions and activities.
Comment: Where will the extra matadata be used? Also, when the document is modified, is this recorded, and where - what is the relation with Archives? Reply: we would hope that other software would use them. Also, we are working with archivists and trying to make sure our records will work with them.
Comment: with huge volumes of email, we know that operationally only a small percentage is of archival significance. Given the large costs to assigning metadata, how do you approach this? Reply: maybe email will drop off because of the workflow process, then all the records will be captured, because it's all in the document. But there is a big issue with legacy email.
|Key question: There is a relation between use and major metadata fields, eg., classification. How is this use captured, how is it mapped to (say) classification? What, in the end, is the role of user in classification - because we can't just say they produce bad metadata. When does use suggest we don't need a record any more?|
Standards in the Book Industry Supply Chain
I wonder, after all this, should I even be here, should the supply chain even be here? I look at the nice elements and say, yup, that's the way it works. But it's the wild west, because it's a supply chain. It's a very different world. It isn't thought out conceptually.
In English Canada, the primary relationship for Canada is with the U.S. - about 70 percent of the books available here come from the U.S. Because of Canada's history there is also a connection between Canada and the U.K. English Canada was historically separated from French Canada, but the emergence of Indigo Chapters has created a relationship. There is, of course, a relation between Quebec and France.
Ther is a commercial imperative shaping supply chain standards development. Also an entrepreneurial imperitive. The major players are booksellers (especially chains and internet booksellers: Indigo, Amazon, independents, college; Renault-Bray, Archambault), publishers, distributors and wholesellers (who would be interested in availablity data; the Big 5 distributors, esp. Indigo), independent data services (R.R.Bowker, Titlewave, PubStock, BTLF, Electre), and trade associations and government (who are concerned with the overall success of the industry; ACP, CPC, CBA, CBWA, ALQ, ANEL, ADELF).
In the U.S., wholesalers (such as Baker & Taylor, Ingram) are very important and are often at the table. U.S. data providers include Bowker, Pubnet/Pubeasy, and Ingram. Nielsen Bookscan are also involved in tracking point-of-sale data. Associations include AAP, ABA, and NACS. The major distributor in the UK is Gardners, while Nielsen Bookscan is also involved. Also PubEasy. Associations inclide PA and BA. batch.co.uk
The major tasks include:
- Descriptive metadata
- Identifiers (ISBN, etc., subject codes (which vary in regions))
- Basic Metadata (title, author)
- Enhanced metadata: cover, TOC, reviews
- Packaging and Trading metadata
- Identifiers (EAN13, UPC, GTIN (includes packaging metadata))
- Bar code standardization (the size of the bar code is an ongoing issue)
- Availability metadata - cf. eg., PubStock, for availability queries over the net, also Ingram and Bowker - APIs are increasingly important
- Commercial metadata
- Ecommerce - DRM, manufacturing
- Point fo Sale (retail, eBook (had to be developed, to at the very least report sales))
Standards organizations include:
- Canadian Book Industry Advisory Committee (CBISAC)
- Dept. of Canadiuan Heritage
- Canadian focus
- Canada (French)
- BISG - Book Industry Study Group - (BISAC - Book Industry Standards and Communications)
- Canada (English)
Major initiatives include BISAC, EDItEUR, ONIX... "Unfortunately the most populat type is 'other' - the most popular thing is the spreadsheet". But most are unifying under ONIX (2.1 current version). ONIX. Handles basic metadata, distribution - is designed to be extensible. Input comes from BISAC committees. The ISO group revising ISBN has selected ONIX for this.
Other initiatives include identifiers (ISBN, IOTC, DOI), packaging and trading metadata, availability metadat, commercial metadata. See especially EDItx for trading.
Plug for Booknet Canada.
Comment: will there ever be some sort of common core? Reply: ONIX, EDItx. But we can't even agree on what the quality of a book is. Eg., the concepts we have of 'trade paperback', 'mass makret paperback' isn't used in the UK, and in the French system they don't use 'format' at all. Because of the cultures we work in there will always have to be translations done. Also: subjects are never meant to be used by the end user.
|Key question: What parts of traditional commerce are applicable online? How do book industry standards carry over into online content?|
Natural Resources Canada
Geomatics is the science of gathering, interpreting., etc., geographic data. Based on latitude and logitude, or on some grid coordinates.
Geoconnections was launched in 1989 to provide access to geospatial data. They are working on web-based applications. Defined by the Canadian Geospatial Data Infrastructure (CGDI). It answers the question, "Where on earth is it?" This is especially important in emergencies. 9-11 eorkers depend on digital maps to find their way.
The GeoConnections Discovery Portal system was build on the Canadian Earth Observation Network technology. Keywords and thesauri have been developed. Datasets were developed using the Federal Geographic Data Committee (FGDC) standard. Three profiles: biological dqata profile, shoreline metadata profile, extensions for remote sensing metadata.
The services side of the site was completely overhauled - we were using a proprietary system, but are working with standards now. We had to create a keyword set to allow the services side of the portal to be discovered.
University of Ottawa
In Canada we have cost-recovery structures for data, so data is not readily accessible. So people create their own data, because they can't afford it. Creating metadata is something they never did.
The Integrated Metadatabase (IMDB) is a collection of facts about each of Statistics canada's 400+ surveys. The basic element is the 'survey object'. We know that metadata can mean all kinds of different things: we wanted to focus on a particular use, to help human users interpret statistical metadata, especially things like survey methodology, description, concepts and variables, etc.
The database was implemented in November, 2000, covers survey description, methodology and data quality. It is published on the STC website, with daily updates. There are efforts underway to improve metadata quality. Metadata is based on ISO 1179 Data element registry. For the statistical part of it, other models were used - Corporate Metadata Repository (US Census Bureau) and Bo Sundgren of Statistics Sweden.
Data elements are administered; there is associated with a data element an administered component (in ISO 1179). The extension is to apply the idea of an administrative item to more entities of a data element. Thus we have 'statistical activity' - a survey, a survey's coverage, a survey 'instance', and also methodology.
The next phase of IMDB will be to extend the content and he published data through CANSUM. Additions to the site will include a list of variables for every survey, links to definitions, classifications, sources of online data.
We need to have a way consistently structuring the dtata that we publish: there must be a consistent way of naming variables, for example. To be searchable, we need not only meaningful names but consistent names. So we use: statistical unit (the things we observe) + property (thing being measured) + representation (form of the data) = variable. These three elements are used to create the name of the variable.
University of Toronto
Web Accessibility Issues, or, the "Finding the Suitable Needle in the Correct Haystack When you get bifocals, many of your choices are restricted. When you break your leg, your choice of route is restricted. When choices are limited, it becomes a question of 'must have' rather than the 'right thing'. Somethimes information access becomes a necessity: when you need to know the ingredients if you have a peanut allergy, for example.
The challenge is the task of translating many languages, some of them secret, with source material in many locations and sometimes hidden.
What is accessible content? Two questions: how transformable is it? Or, is there an alternative but equivalent presentation of the same information?
Accessibility for LIP (learner Information Package) (ACCLIP) - How do I want things to be displayed? How do I want things to be controlled? An example implementation is Web4All, which allows you to save personal preferences on a smart card, then allows you to configure the publicly accessible (CAP Site) computer to your preferences.
Application profiles for accessibility - including controlled vocabularies, machine processible - but for accessibility, this needs to be much more global. ACCMID - accessibility profile - contains informaion about conformance to accessibility standards, and answers to the questions above.
The authoring of accessible metadata - the usual question of who will author this metadata. Within the TILE project, an "integrated, unconscious authoring" of metadata occurs during content production or assembly. www/barrierfree.ca/tile
"We may have more labels than we have things out there."
Chair, Standards, Research and Development Sub-committee of the National Advisory Board for Canadian Culture Online, Department of Canadian Heritage
Defining a Metadata Strategy for Online Cultural Content
What it comes down to is using metadata for access - and we need to ask what the point of the access is and how is it used?
We're not going to reinvent standards - the question is how we use them, especially for content producers and end users (ie., people)> Canadian Culture Online Program (CCOP): three objectives: to achieve a critical mass of content in English and French; To build a conducive enviroment for Canada's cultural industries; To increase visibility and build audiences for Canadian digital content. There is a future generation - there is a demographic divide - there is an audience that is evolving.
Committees: Canadian Culture Online National Advisory Board - mandate to advise the minister on the general direction of the program, and to inform on the needs of the program. Subcommittees: Gateway and access subcommittee, content and innovation, and standards and research development. This latter tries to identify the needs of users.
The issue, for me, as a chair of a standards committee, is that standards are actually the last thing anybody would want. We need to show their importance. It related to the idea of producing content: we welcome standards so long as they don't limit our ability to create, and to make sure that content is accessible, not nly today, but also tomorrow.
To make it work on the operational level: What is the level of granularity? What are the implcations of tagging cultural content? What should we expect as measurable outcomes? What kind of tools? What kind of support do recipients require?
Next step: clause in CCOP funding agreements addressing metadata. We want to find a way to make it implicit and embedded in their practice. What does it mean when we make creators into catalogers? Also, we want a Canadian repository of cultural objects.
National Research Council
Questions and Possibilities: The Four-Dimensional Future of Metadata