Downes.ca ~ A Voice in The Void

A Voice in The Void

Dec 19, 2003
By Stephen Downes

A compilation of some posts to the CC-Education mailing list.

The comment (by Heather Ford) about OpenOffice is interesting.

As you may know, MS is signing deals with various content providers for exclusive access as subscription-based resources into Office products (think of it as 'son-of-Smart-Tags'). This method of circumventing normal information distribution channels effectively gives MS control over content distribution (terms of which would be enforced by the MS Rights Management Server).

It is worthwhile to think of setting up an open system whereby Open Office users are able to tap directly into (CC and equivalent licensed) RSS feeds distributing free content. For example, a direct connection between OpenOffice and Wikipedia would be most useful.

It is going to be important for those who promote open access to think of the content infrastructure as an interconnected network extending from the author directly through to the user's desktop applications. Anything less, and it will be controlled and monopolized.

Heather Ford wrote:

This sounds really interesting, Stephen - would love you to explain more about how you think this could work/look. I've been thinking a lot about how to make this connection more direct than just having a list of open content on the cc website. I'd love to find a way of extending the user-friendliness that CC is becoming well-known for, in developing this side more - but I'm still trying to work out how it would look.

Obvously I don't have all the details worked out in my mind. But there are two major aspects and an optional:

Treat the internet (or more specifically, the network of free content repositories) as though they were a local file system
Create more mechanisms within the software itself to access that file system
(Optional) Provide usage metadata back to the system

On the first point, the idea is that free learning (and other) resources are available for access from within the OpenOffice software in a manner analagous to the way that files are currently accessible. Thus, the system would allow anybody to open such material straight from the content repository into an OpenOffice application.

This would most likely be instantiated as a three step operation:

locate materials
view materials
select (all or part) of materials for use

The first part is key, since you can't simply browse hundreds of thousands of resources the way you can browse your hard drive (you can't even browse some hard drives any more, including mine). Thus, an effective search mechanism would be required. Parts of the search criteria would be specified automatically by the means of access (see the second point). For example, if a user selected 'Insert an Image' then the search would look for free images, not text documents. The search should also try to draw on other environment variables. Knowing the user's language or country of origin would also provide preferences.

If I were designing the search, I would allow the searcher to select default content repositories so I could pick repositories dedicated to a specific topic. For example, I would probably use something like Edu_RSS to search for related articles. The idea here is to treat the network of content repositories as a self-organizing networks, as described in the last section of my paper, Resource Profiles.

The second part is a mechanism to preview items before inserting them into your document. I don't know what to say about this, really.

The third part is the payoff. Once the user has reviewed the resource, she selects either all or part of it. The resource is then copied directly into the docuemnt in the appropriate format. For example, suppose the user found a text on owls, and located a paragraph worth quoting. She highlights the paragraph and then double-clicks (say). The paragraph is then copied into the document, and a tag or caption is placed into the document identifying the bibliographical information and the creative commons (or equivalent) license information (this latter is key for commercial users, who will not reuse content unless they are certain, and can prove, that they have permission to use the content).

On the second major point, we want to make access to these resources integral to the application itself. It is worth noting in passing that this is a service that should be provided by repositories (including the ContentForge being discussed) into any application, not just OpenOffice, though the authors of those applications would each need to write code to provide access and use of the content.

In a program like OpenOffice, I would certainly include a function such as this into an 'Insert' option. But more. I would also allow a user to highlinght a word, double-click (say), and be shown a default set of resources for that word, such as definitions, encyclopedia entires, etc. Highlighting a person's name would yield a list of resources authored by the person, or (switch a toggle) a list of resources about that person. Certain keywords in the highlighted text would trigger certain sorts of search results (the user wouldn't need to learn these, the idea is that the system responds to what the user is asking for). For example, 'graph' would tend to return graphs, 'map' would return maps, 'flag' would return flags. Different repositories could handle this in their own way.

On the third point, I would suggest that users be given the option to report usage of materials back to the system, much like RSS trackback, except that only metadata would be sent back (since the document does not have a URL just yet). The usage metadata sent back captures the context of use and helps describe the resource being used. This helps the aggregator more accurately process search results for future searches.

Now again, this sort of content distribution system should not be restricted to OpenOffice. The same mechanism ContentForge uses to make content available to OpenOffice should also be available to allow people using Moodle to import (free) content into an online course. It follows, therefore, that the sort of content that could be offered through such a system could be something as simple as an image all the way to something as complicated as an animation, simulation, or port into an interactive environment.

I don't want to 'shoot anything down' but I do have some comments...

Steve Foerster wrote:

One can browse the entire Internet as one would a hard drive using services like DMOZ and Yahoo! If open content is categorized intelligently, I'd think it would be possible to make it all reasonably accessible by drill down.

I beg to differ. I haven't used dmoz or Yahoo to locate anything on the web since the late 90s. The size of the internet long ago made such directories impractical. Moreover, the a priori classification schemes turn out to be difficult to implement and even more difficult to use, after a certain level of complexity is reached.

Perhaps a useful approach would be to designate a set of meta tags that tag its document as open content and specifies which license applies to it? If there also were a recognized categorization scheme that had its own meta tag, then robots could add open content to a directory. Of course, this approach is HTML centric, but there could be allowances for PDFs, RTFs, Word documents, and other file formats that were available from a page with such tags.

It is HTML specific, or at the very least, text specific, which means that it won't work at all for non-text formats such as images, videos, animations, simulations, and the like. Given that separate metadata files must be created for these formats, and given that it is better to use a single data representation to facilitate search, it makes sense to use metadata files for everyinging, a la RSS. Moreover, the use metadata typically means we do not need to load entire files to see what they are, so there may be some bandwidth savings.

There are some other issues related to embedding tags in the documents themselves, which I'll raise should it become necessary.

The second part is a mechanism to preview items before inserting them into your document. I don't know what to say about this, really.
Since I'm going meta tag crazy, I'll ask whether the description meta tag is useful here as a sort of thumbnail for text? I guess it depends whether you're looking for a specific paragraph or a whole document.

Think metadata, not meta tag. In such a case, the description metadata plays this role. But my experience is that, when considering whether to *use* something, they like to see the actual resource, not just a 'thumbnail' description.

The third part is the payoff. Once the user has reviewed the resource, she selects either all or part of it....
I suppose the open content directory that uses a meta tag system could also generate code for use by those publishing modified documents.
"If you use this content in your document, you must add the following HTML code to your document's header:

Right idea, but again I would do it slightly differently.

CC licensing information can be very easily added to metadata describing a resource. RSS 1.0 has a CC module. I am developing / implementing a very similar system to use ODRL in learning resource metadata.

If metadata is being (automatically) *created*, then scanning the document may help retrieve such CC information. This can already be done thanks to the existing CC labels sites use.

Secondly, I would not say 'you must add the following' code to your document - I would simply include the information as a part of the cut-paste process. Making it an automatic part of the software makes it a lot easier to use.