Downes.ca ~ The Semantic Social Network

The Semantic Social Network

Feb 14, 2004
By Stephen Downes

Two types of technologies are about to merge. The technologies are content syndication, used by blogging websites around the world, and social networking, employed by sites such as Friendster and Orkut. They will merge to create a new type of internet, a network within a network, and in so doing reshape the internet as we know it.

The purpose of this article is two-fold. On the one hand, it is to describe the emerging Semantic Social Network (SSN) and to sketch the nature of the new internet we are about to experience. And on the other, it is to promote the development of the SSN by describing to developers the sorts of systems required and to suggest how the SSN can be used.

1. Starting Points

To begin, allow me to briefly outline where we are today by describing the two types of technologies that are about to merge.

RSS - or Rich Site Summary (or Really Simple Syndication) - is a metadata format used to describe online content. An RSS file, sometimes known as a 'feed', is an XML file summarizing a website's contents. The RSS file consists of two major sections: a 'channel' element, that describes the website as a whole, and a series of 'item' elements that describe individual resources.

There are varios sorts of RSS formats, and even an emerging format that does the same thing, called 'Atom', but these are for the most part interoperable. It doesn't really matter which sort of RSS (or Atom) you use. These feeds are collected - or 'harvested' - by RSS file readers or aggregators. At first, it was only possible to read one feed at a time, but new services have emerged that combine lists of items from several feeds, creating topic based or geography based feeds.

A great deal could be said of RSS, but the essential points for this discussion can be summarized in a few words: RSS feeds describe content using XML, and they are gathered and reorganized by aggregators, creating content syndication, a means of sharing content quickly and efficiently.

Social Networks - the social network has a history almost as long as RSS, having been piloted with sites such as FireFly in the late 1990s. Aspects of social networks have long been present in dating services such as Match and Casual Kiss. They have gained increasing popularity over the last few years with the devlopment of sites such as Friendster and Orkut.

A social network is a website whereby individuals describe themselves in a personal profile, reveal themselves through participation in communities, and form networks of interactions by declaring one another to be 'friends'. The expressiveness of a social network is created through these networks, as 'friends of a friend' may be introduced to each other as having common interests, even though they may not have met previously.

2. Making Contact

Bloggers have long been aware of the networking possibilities inherent in their medium. This was expressed early on through the use of a Blogroll on their website, a list of other blogs that the author of a given blog reads frequently. As a more precise accounting of the relations between authors was desired, bloggers began inserting into their RSS feeds a 'trackback' URL, so that one blog could notify another blog when a resource was cited or passed along. Techniques such as blogchalking were used to explicitly identify communoty affiliation.

People like Seb Paquet and Lilia Efimova have for the last few years drawn this out explicitly. Efimova writes, "What weblogs create is two way awareness. If I read someone's articles online or check personal pages or "know" a person by reading comment in online discussion, in most cases this is one-way "getting to know": this person is not aware that I'm learning about him or her. Weblogs change it: if another blogger links to your weblog as least ones, he is likely to get on your radar." [Ref]

The relations formed between bloggers is similar to that formed between people in a social network. "This awareness creates something that I don't have a good name for. It's close to familiar stranger, but there is some kind of interaction (or, may be linking is similar to looking at person a physical environment, you don't expect a feedback, but another person is likely to notice that you have looked). I would say that this connection is one degree stronger than 'familiar stranger' [Ref] connection. And then this connection may turn into something stronger - 'weak-tied' conversations, with one more degree stronger. Then it may result in joint actions and "strong ties" at the end."

With the rise of social networking, bloggers have been looking more explicitly at the comparison between the two. There is a resistance, on the part of some bloggers, to creating an 'identity' on a social network: their blog is their identity. [Ref] Dina Mehta: A blog is "A profile that changes, grows, flows - not a cold resume or Ãƒ,Ã‚`about meÃƒ,Ã‚' page filled with past achievements and accolades - but is touchy-feely and one that says more about me through my thoughts, interests, preoccupations, rants, rambles and angst - that makes me more than just a consultant or a qualitative researcher - or a demographic statistic, Ãƒ,Ã‚'female blogger from IndiaÃƒ,Ã‚'." [Ref]

3. A Broken Network

If the blogging network becomes a social network, however, what explains the rise of social networking services? That there has been a rise is indisputable. Research by Judith Meskill revealed more than 100 social network sites early in 2004. [Ref] The launch of Orkut resulted in a storm of discussion across the internet. [Ref] It is evident that social networking sites are filling a need not provided by the blogging network.

What social networking preserves, that blogging does not (at least, not in an evident way) are identity and community. In the blogging network, links are formed exclusively between content items. A blogroll links to blogs. A linback links to posts. While the author of a given article may be carried from post to post, information about the author - such as the author's blogroll or network of friends - is lost, obtained only through a tiring process of tracing back through the link to the blog home page and blogroll. This is alleviated somewhat with the share your feeds' service, but the link reains to content, not people.

And there are no explicit communities in the blogosphere, no way for an individual blogger to declare affinity with an entity rather than a piece of content. For many bloggers, this is a barrier. A blogger's content will become known only if it is shared by other bloggers, but other bloggers only read each other. If there is community in the blogging world, it surrounds the sites of major bloggers. The influence of these bloggers is exaggerated by the inability of others to penetrate this sphere, and is reflected in what Clay Shirkey calls the 'power law' of blogging. [Ref]

Shirkey depicts the power law as a normal state of affairs. "In systems where many people are free to choose between many options, a small subset of the whole will get a disproportionate amount of traffic (or attention, or income), even if no members of the system actively work towards such an outcome." However, this observation is based on a non-random set of observations. No such concentration of 'power' occurs in the telephone network, for example: there is no small set of individuals that receives the bulk of the telephone calls.

If inequality is a natural phenomenon, it is a phenomenon that is natural only in an environment of scarcity. In broadcast media, few people have access to the airwaves, and hence there are stars. In a telephone network, everybody has access to the airwaves, and no such stars emerge. The existence of a power imbalance in readership of weblogs is an indication of scarcity, and it is this scarcity - the opportunity to be heard - that is redressed through the formation of social networks.

"I've been playing around with Google a bit and I've seen some critiques of the inevitable and impending commercialization of the service, but very few real comments on a sociological level, i.e. we have this huge mass of people so desperate for a way to CONNECT that they put faith in what most of them have to admit, in their more self-aware moments, is a flawed attempt to do so. There's something in that which frightens me." [Ref]

4. There is no 'There' There

The rise of social networks has brought with it an almost immediate backlash. This backlash began with widespread disillusionment with sites such as Friendster and as rebounded with fresh critiques of Orkut. The criticisms can be summarized with a single phrase: there is no 'there' there.

Friendster foundered on the problem of fictional identities. "Currently, however, Friendster has a problem with 'fake users', generally imitators of celebrities. Since anyone with an email address can create a Friendster identity, some people make up these fake identities as a joke, which several others add to their list of friends." [Ref] This, in turn, was reflective of a deeper problem: "there is currently no way to maintain a consistent digital identity online. This is essential for most social systems, since most such systems must have a way to link actions to individuals over time."

In other words, an identity on Friendster - and also on Orkut - is empty, consisting of nothing other than the profile posted into the service. The many things that make up a person's identity - what I have elsewhere characterized as a 'profile' [Ref] - are missing. A Friendster identity is a hollow shell, and as a hollow shell is a prime candidate for spoofing.

Gary Lawrence Murphy writes of Orkut and social networks in general, "they are not social networks, only flat-taxonomy directories of questionaire replies, and badly designed questionaires at that (and) because they do not interoperate, because they cannot share data or interchange or allow identity migrations, they are essentially anti social, building protectionist walls around people (called 'clubs' or 'communities' but really meaning the opposite)." [Ref]

In a social network, the concept of friendship - and hence of the network - is empty. Thus abuse [Ref] is a natural first consequence. So is the idea of refining a taxonomy of types of friendship. [Ref] and [Ref] Such efforts, however, do nothing to mask the fact that, in a social software system, there is nothing at the end except a short profile and, if we're luck, some contributed comments. The richness and subtlty of a blog identity, mentioned above, is missing.

5. Distributed Social Software

It is perhaps a bit of an oversimplification to say this, but the problem could be summarized with the following observation: the blogging network and RSS link content, but not identities, while the social software network links identities, but not content. Exaggerating this problem, on the side of social software, is that a genuine network does not yet exist. Social software sites impose barriers to entry, and are not connected with each other.

The first step, therefore, toward addressing the shortfalls of the two systems is to break social software out of its site-specific mode. This has been suggested by numerous people already. Eric Gradman, for example, noting the problems already described, proposed 'distributed social software' having at its core a 'friend aggregator'. [Ref] Such a system already exists in the form of the Friend of a Friend project (FOAF).

The idea of FOAF is that it is like RSS for personal identities. To enter the network, a person creates a FOAF file. Like RSS, an FOAF file is an XML file that can be aggregated by harvesters. The FOAF file can be used to express the same content as a person's Friendster or Orkut profile would contain. But the creation of a FOAF file does not depend on membership on a specific site. Any person can create a FOAF file using a FOAF generator>/a> and place it on their home page. By submitting the URL of the FOAF file to an aggregator or by linking a FOAF file on their home page. [Ref]

The FOAF format, in addition to defining personal profiles, can define communities as sets of references to individual FOAF files. The FOAF project home page describes the use of FOAF to define affinity groups, project teams, and more. As Edd Dumbill writes, "I hope you can see the power that merging offers. By aggregating and merging the FOAF files, you can achieve the same effect as operating a centralized directory service, without any of the issues of single points of failure or control. This is a very attractive feature for many communities for which decentralized or devolved control is a requirement, either because of political structure or sheer size." [Ref]

This phase of the transition to the Semantic Social Network has already begun. Social networking sites such as Tribe have begin to realize that they must allow people to create FOAF profiles from their network profiles. [Ref] The easy creation of FOAF files will have the same impact on social networking as a tool like Blogger [http://www.blogger.com] had on blogging. It will no longer be necessary to own a website to participate. It is, no doubt, ony a matter of time before blogging software generates FOAF files, for they would otherwise lose the advantage to social networking sites.

6. Who I Like is What I Read

One of the major barriers to the use of a FOAF file is in the creation of a list of friends. This is the service social networking provides: it is possible to add a friend (usually on request) by clicking a few buttons. But in the wider FOAF world, it has until recently required that information about friends be filled out manually into a web based form.

Such a construction of a list of friends, additionally, suffers from the same weakness of a list of friends in Friendster or Orkut. It is artificial. But to be of value, a social network must represent some sort of interaction. As David Weinberger writes, "But if you want to get at the real social networks, youÃƒ,Ã‚'re going to have to figure them out from the paths that actual feet have worn into the actual social carpet." [Ref]

Jon Udell notes, "I realized long ago, for example, that maintaining a blogroll by hand was going to be a losing proposition, and switched to a system that simply echoes the list of feeds to which I'm currently subscribed." [Ref] For example, FOAF autocreation enables a person to generate a list of friends using their own OPML file. Since an OPML file lists the blogs a person reads (their blogroll), and hence (via content) a list of people, it is a straightforward matter to read an OPML file and generate a list of friends.

Such a solution is not comlete, however. For one this, it would need to be incorporated into other tools; the reliance on a specific website to author a FOAF file creates unnecessary complexity. Additionally, this merely pushes back the problem of creation one step: it is still necessary to author an OPML file. Stand-alone OPML generators exist and OPML may also be generated auto matically by blog reading software. But what is needed, specifically, is a service that (a) creates a personal profile, (b) creates a blogroll, and hence creates the fully descriptive FOAF file.

Jon Udell again, "The reality is that every document published to the Web can help to define a relationship -- by linking to, quoting from, or more subtly supporting or refuting another document. Of these actions, linking is the only one that's always unambiguously machine-readable." []

Use of the FOAF autogenerator also reveals a second problem. Links in OPML files are to RSS files, or in other words, to content. A further step is required to locate FOAF files, and while the autogenerator tries to do this, the results are spotty at best. This points to an earlier noted problem: RSS files do not preserve identity.

7. Identity in RSS

It is perhaps a quirk of history, but the original definition of RSS did not include 'author' as a field in its item elements (using only 'title', 'link' and 'description'). Hence, in RSS as originally designed, unless the item being described was authored by the creator of the RSS feed, author information disappeared from references almost immediately.

Later iterations of RSS (and specifically, RSS 1.0) address this through the use of the Dublin Core 'creator' field. [Ref] But while useful, the Dublin Core field points to people not via any sort of XML, but by using their names. The Atom syndication format goes further, creating a 'person' construct. [Ref] but while a URL is allowed, it is not clear that this URL points to an XML file.

RSS files should explicitly point to the author of items via their FOAF files. Though reference to a HTML file which contains pointers to FOAF files (ie., 'autodiscovery') will do in a pinch, this is a needless complication. FOAF information can and should be available at the time an article is created (after all, authors create their own items) and may easily be embedded in an RSS file describing that item. Aggregators of these files can pick up a FOAF pointer as easily as it pucks up the URL to the article, and so if the article is cited in a blog, a pointer to the author's FOAF can and should be stored along with reference to the article itself.

As an aside: there is some ambiguity in the semantic web community about how to express pointers to XML files in metadata. As I outlined in a previous article [Ref] there is no clear means of determining, without actually opening the file, whether a given URL points to an HTML or an XML file. As Tim Bray wrote, "Everyone agrees that when you get confused about what's being identified, this is a bad thing and makes the Web less useful. As TimBL has said repeatedly: a resource can't be both a person and a picture of a person. Unfortunately, such ambiguity is not a condition that Web software can detect." [Ref]

In the same article, I proposed distinguishing between references to XML and references to HTML in the use of a URI, that is, by how it is placed in a metadata file. Specifically, I suggested that references to XML be contained in 'about' fields of metadata takes or address tags. Thus, for example, the 'person' element in an Atom file would point to a FOAF file as follows: and embedded links as follows: person's name. It seems reasonable to adopt the same protocol here, thus allowing designers of new systems to unambiguously know where to find FOAF information.

8. What Needs to be Done

The Semantic Social Network is tantalizingly close. As I posted to Orkut: An effective network could be created with very little:

- Get people to create personal metadata files. FOAF might do the trick, if extended. The personal file contains the usual profile details found on a site like this, plus options to say 'likes', 'hates', 'cool' other people.

Needed: a simple FOAF management tool, the Blogger of FOAF, if you will, that people can use to create these files. A method for securing and verifying identity, to prevent fake FOAF files. A means of aggregating FOAF (already exists) for use elsewhere.

- Reference to FOAF in other documents. FOAF by itself (like Orkut by itself, or any other sterile SN environment) serves no purpose. Place FOAF links into content metadata (such as RSS) and now the content metadata system and the SN metadata system can interact. Aggregators harvesting both FOAF and RSS have enormous expressive power.

- Extend FOAF, or RSS, or create a new type of format, for individuals with FOAF identities to attach values (like, dislike, loved) content items with RSS identities. Add to the mix. Aggregate.

I also wrote that SSNs work when...

- comments in boards point to profiles created by (and owned by) the people they describe, not isolated centralized islands like Orkut, Friendster, and the 100 or more similar separate sites.

- references to such FOAF (or similar) files - or something similar to FOAF files - are attached to content or content metadata (such as RSS), identifying the author.

- we can go to an aggregator and say, "Find all the articles by people that Clay Shirkey likes." or "Find all the articles about RSS by people who don't like Dave Winer." or "Find all the articles on Google by people who wrote articles that Jon Udell likes."

- when influence is determined not by the number of friends you can sign up, but by the usefulness of results produced by using your profile and preferences in a search.

- when all of this is used to do something, and not merely facilitate chatter.

9. The SSN Application

Possibly even by the time I finish writing this paragraph, the first semantic social network applications will begin to roll off the production line. A SSN application will combine these major functions:

- It will be a personal profile generator, like a social software site, that allows the user to create and maintain a FOAF file

- It will be a blog / comment / content reader that allows the reader to easily make notes about the item being read. In other woords, it will contain the equivalent of Radio's 'Blog This' button. [http://radio.userland.com/] In addition to typing a blog entry, readers may add the blog to their blogroll (OPML), indicate an affinity with the author or submit an evaluative rating.

- It will be a search / aggregation tool that uses FOAF and RSS aggregators to satisfy queries based not only on the content of an article but on information about the authors of those articles.

- It will be an authoring tool that publishes not only blog posts but also an RSS file; the published RSS file will (automatically) include references to the author's FOAF and the FOAFs of any cited authors.

- It will allow the user to create and join communities, automatically tagging contributions to those communities (such as my posts to Orkut) so they may be syndicated to other readers, and so that the body of a person's contributions may be seen in a single place.

And - of course - much more. Because this basic functionality merely gets the Semantic Social Network off the ground. What lies beyond is limited only by the imaginations of software designers.

10. Why We Need a Semantic Social Network

What David C. Foreman says of learning software could be said of content-based software in general: "Most training and educational professionals have focused their efforts on learning in individuals, not organizations. [but] The competitive strength of companies and even countries is now tied not to physical resources but to the knowledge and skills of people. And these people do not work in isolation within companies; they work in teams, informal groups and in multiple roles." [Ref]

Foreman describes two major 'levels' of organizational learning: a 'contribution level', in which knowledge is created and input into the system, and a 'multiplier level', in which knowledge is shared and reshaped by a community, making the aggregation of the contributions greater than the individual parts.

At the contribution level, we see the advantages already cited of the blogging system: individuals learn by reading from others, they collaborate through blog conversations, they leverage what they know with new practices, and they build on the work of others to innocate. And the benefits of social networks can be seen at the multiplier level. People mentor each other through the formation of communities, they network and form new organizations, and they inspire each other by example and input.

Foreman: "The framework's levels and organizational capabilities are taken from the nexus between instructional design and organizational development where learning becomes both meaningful to the individual and to the organization. Some training and HR professionals may not want to enter this new organizational learning arena because it is ill-defined, complex and high-stakes. It is safer to retreat to our specialties. But organizational learning has great promise precisely because it is so important, yet poorly understood. If we come out of our silos, we have a lot to contribute."

What is true of learning organizations is true of online community in general. Content without community cannot achieve its full impact. Community without content is empty. It is only through the joining or fusing of these two levels that the full advantages of both worlds may be realized.