Downes.ca ~ Stephen's Web ~ Community Blogging

Community Blogging

Feb 24, 2005
By Stephen Downes

My talk from the Northern Voice conference, February 19, 2005, Vancouver. An analysis of community as it emerges in blogging: how it is formed, how it should reshape the blogosphere, and how it can be implemented (quite easily) technologically. And along the way, deflating a few pet concepts of the blogerati, such as the value of the long tail and the utility of tagging.

PowerPoint Slides (9 meg)
MP3 Audio (6 meg).

Thank you. My name is Stephen Downes. I come from the other coast of Canada, Moncton, New Brunswick. I work for the National Research Council, which means that I'm a government employee, which means that I don't own any of my own words, and I'm here to talk about community blogging.

Now I'm not going to talk about a bunch of people all getting together and blogging on the same website, as some people represent community blogging, because I don't really find that too interesting. What I'm going to look at is the relation between community and blogging, how blogging becomes a community, how a community becomes a group of bloggers.

Basically I have four sections. I'll talk a wee bit about what constitutes a community. I'm going to rant and rave against the concept of the long tail. I'm going to explore Wittgensteinian theories of meaning. I'm going to talk about distributed network semantics.

Now this may sound like it has nothing to do with community, but my intent here is to try to reframe your thoughts on what community is, what community on the web is, and what a community of bloggers is.

So we ask what constitutes a community, and we look around in the real world, and we find communities pretty easily, we just look for a city, a town, a village, a neighborhood, and what creates a community, typically, in the real world, is proximity. We are part of a community because we live in pretty much the same place that other people live. So even though you may have nothing whatsoever to do with your neighbours, you are still part of their community.

And for a long time the concept of community online was based on the same concept. And we heard about it from people like Hegel and Armstrong, even Cliff Figallo, a bit, Howard Rheingold, a little bit. But the idea was that community was a place, just like a town or a village. Online we would call it a website, or a portal, or these days, because we're so much more advanced, a social networking site.

And the model, as defined by Hegel and Armstrong, is, you set up this site, you give it a topic, you bring in some people, you give them a way to communicate, you retire to the Cayman Islands. It didn't work out that way.

Now my field of study is online learning. That's where my expertise lies, and I actually don't really know very much about social networks or blogs or things like that. In online learning... learning - schools, universities - they're almost the prototypical communities, aren't they? You gather all these people into one place, you organize them into classes, you get a bunch of subjects together, you slice and dice the range of knowledge that people are supposed to have in order to become productive and obedient members of society.

And online we see the same sort of thing. In online learning, we have this thing called the Learning Management System, or the new 2000 version, the Learning Content Management System, and again, it's a site, it's a place, you log on, you get your class, and your class has a bunch of lessons, and you go to the special room where you're allowed to talk to other people, and call that the chat area, or the discussion area.

And in social networking it's much the same thing, right? If you belong to Orkut, you log on to orkut.com, or orkut.org, whatever it is, I never remember, I just type 'orkut'. Or you log on to Friendster, or LinkedIn, or even Flickr, and being a community means going to that place, and being part of that community is to a large degree a matter of proximity. And in some cases persistence; I got an email the other day, "This is your fourth reminder that you have not responded to this LinkeIn invittion," and I'm sitting there, thinking, "Don't they have a clue yet?"

But I challenge the perception that these are communities, despite what it says on their home page. All they are is proximity; they're places where there are people in the same place. But I don't think that that's what defines a community.

A lot of people have written about community online, and I'm not going to rehash what they said, except on this slide. But look at it. Cliff Figello talks about the relationships and the exchange of commonly value things, among other things. Bock talks about common interests, frequent interaction, identification, and Paccagnella talks about articulared patterns of relationships, roles, norms. And these accounts of community are pretty typical, they're pretty widespread, and you'll find them in most of the work that you read about online community.

Now I want to draw out from these descriptions two major elements that I think are probably definitive of community. First of all, the idea that there's a network. Now a lot of people capture that by saying people can interact, people communicate, there's a place for discussion. But the central thing here is that there is, in some sense, a relation among the people; it's not mere proximity. But they are connected in some way.

And the second thing, and the important thing, in my mind, is semantics, the idea that these relations are about something, that the people in the community share a common interest, common values, a set of beliefs, an affinity for cats, or beekeeping.

Now we have a pretty good understanding of networks; there's been a lot of work in the theories of networks. We have a much less refined sense of meaning. Fortunately, one of my other jobs is as a philosopher, so I spent many years studying meaning. I never thought that that would be useful. In fact there was a little sign on the wall where I took philosophy, it said, "You are not going to get a job. Give up now." They made us sign a little piece of paper, "I recognize that this will not prepare me for any future employment."

So let's think about this a little bit. I'm going to come back to meaning but I want to rant and rave a little bit. Because the long tail, as we are told, repeatedly, is a property of networks, and in particular scale free networks, and the idea here is, you get a bunch of people and you start them linking to each other. And you can set this up randomly, and people start linking to whatever's handy, right? And if you do that you're going to create a set of links.

A power law curve, as described by Shirky.

One of the neat things about this is that you get the phenomenon of six degrees, right? If you go from one person to another person to another person you can get anywhere in the network in just a few hops, and in a group this size, probably two or three hops. Now what happens in a network of this type is that some people get lots of links and other people get just a few links. If you look in the world of blogging, for example, and I'm sure you've seen this written about elsewhere, a site like Boing Boing or Instapundit or Scripting News, they'll get like thousands and thousands of links, and a site like NewsTrolls, which is a site that I run, gets, well, one. And so you get this curve, it's called a power law, and there's what it looks like, and on the one hand you have Instapundit with all those links and then you get the long tail with thousands and thousands of sites with one or two or fewer links each. Technorati: zero links from zero sources.

Now what creates the power law phenomenon? Well there are two major things that have been identified. One thing is growth. The network grows over time. And the other thing is preferential attachment. Now what that means is, you're out there, you're looking for something to link to, because you're a blogger, and you've just listened to Tim Bray and others, and they said "link often" and you think "OK, that's a good idea, I need something to link to now" and you go out and you look on the blogosphere, and what are you going to link to? Well, if you just go out looking on the blogosphere you are probably going to find Instapundit, or you're going to find Scripting News, or you'll find Scoble's site, or whatever, and, OK, it's better than the newspaper, so you link to them.

And so, two things are happening here, right? These people are getting linked to mostly because they were first. And because they were first, there was a time when they were the only things to link to, so people linked to them, and then as time went by they were the ones who had the most links and so consequently they were most likely to be found by new people. So it's like, you know, the way a tree grows, right? You have a trunk of a tree, and that's where all the action is, it's not because the trunk is better than any other part of the tree, it's just the trunk was the part that was first. And all the rest of the tree has to attach itself to the trunk because where else is it going to attach itself to?

So people talk, and people have talked a lot, about the long tail and they've said "Worship the long tail, mine the long tail, the long tail is where the action is." And all of these people who are talking about the value and the virtue of the long tail have the unique pquality of not being part of it. I live in the long tail. And I can say from my own personal perspective that people who are in the long tail would probably rather not be part of it. They simply want to be read.

You know, it's that old thing, it's a little off topic, but in Canada we have socialists and socialists always say, "We represent the working class" and that's kind of like the socio-economic way of saying "We represent the long tail." And they come out with these platforms and these policies that identify with the working people. Ask any of the working people, they don't want to be working people. And so, they're more likely to choose policies that support the rich people, because they all want to be rich, and when they're rich, they don't want to be pushed back into that long tail again. So I don't see a virtue in the long tail.

What the network looks like. From Valverde, Cancho and Sole.

Now when you have a long tail kind of network this is what it looks like. And there are different ways of representing this picture, I like this picture because it kind of gives you the sense, and right in the centre, that's the Instapundits and the Scripting News, and that's where everything started and everything's going to grow out from there. And then you get this clustering and branching phenomenon. But what you should notice about a network that looks like this is not simply that it's root and branches, it's hierarchical, isn't it? And the really important things are at the centre and you go way out, you see that little one sticking way out there, well that's me. No, further on. Further on.

But, thinking about how this comes to be. If everyone links to everyone there would of course be no long tail; we'd all be Instapundits. For good or bad. Preferential attachment occurs only because there is a shortage, and that's why we see the power laws existing in so many places. Why is there a power law in economic distribution in society? Well generally because there's a shortage of money. And if you want to make money you're attracted to the people who have money because that's the only place where you can get money. Online, it's a shortage of attention, of time. You do not have time to look at the links in four million or six million blogs. It's just not going to happen. Even Scoble can only handle a thousand blogs. He's got to be sitting there at night thinking, "God, I missed most of it."

So, you reach out to the closest thing you can find, but the other thing that creates a scale free network is that these attachments are, for all practical purposes, random. You reach out for what's available rather than what's good. And let me ket my political stripes show a little bit, that's how Instapundit becomes Instapundit. He's available. He's easy to find.

Now my approach to this, and the reason why I rant and rave against the long tail, is that networks, on my picture, are not defined as a set of random connections - which, when you think about it, is a pretty stupid way to do it - but as a set of sematically organized connections. Because community is based on meaning, not randomness. Community as proximity - you're part of a community, the same community that your neighbour is part of - that's random connections. And so that's how you find yourself in some meeting with someone who has a completely different political point of view, and you're sitting there arguing with them about how the street ought to be run, because you've been put together randomly.

But community as networks of semantic relations, that's where the connections between members of the community are based on the meaning of those members or of the entities in the network. In other words, in order to create community, rather than a power law, we don't simply pick the most popular or the most available, we pick the most salient connection.

Well. What does that mean? How does something become the most salient connection? Well we need to analyze, or look at, at least for a moment, what a post means. Or what anything means. What a resource means. Now I say that, I'm saying, what does this post, or this person, or this resource, say about the world?

Frequency of tags for a given post.

Now one way, a very popular way, of trying to fix meaning to a blog post, is through tagging. Tagging has been the rage. I'm also anti-tagging. Why am I anti-tagging? Well, take a post, any post, and ask yourself, what would a graph of all the possible tags for this post look like? You are going to get a power law. So you have a post - somebody's written something about the Prime Minister - and so, you know, you have 'Martin', very popular, that would be a very commonly used tag, 'tax break', that might be a commonly used tag, 'my goldfish', maybe once, by somebody who didn't get the concept of 'prime minister'. You're going to get a power law curve of tags.

But the thing is, if you do it that way, then the meaning of the post becomes whatever tags are sitting there in the big spike. Right? So the post becomes, it means, that tag. But that tag contains only a part of the meaning of the post. It's a very narrow, one-dimensional look at something that might be a lot more complex.

Meaning.

Because the meaning of a post is not simply contained in the post. And this is where we have lots of trouble with meaning, because we all speak a language and we all understand words and sentences and paragraphs, and we think we've got a pretty good handle on how to say something about something else, and we have a pretty good handle on how to determine the meaning of a word. What does the word 'Paris' mean? Oh, no problem, right? 'Capital of France.' Right? But, you know, it might also be, 'Where I went last summer.' Or it might also be, 'Where they speak French.'

Wittgenstein.

When we push what we think of as the meaning of a word, the concepts, the understanding that we have, falls apart pretty quickly. And the meaning of the word, or the meaning of a post, is not inherent in the word, or in the post, but is distributed. It consists not just of what the word or the post talks about but in the set of relations and connections that this post has in its actual use, or as Ludwig Wittgenstein said, "Meaning is use."

How do you know the meaning of a word? You look at how people use it, you look at the context, you look at who uses it, where they use it, what the environment is in which it has been used, what other words are around it, and if you define meaning in that way, then the meaning of a word can't be stated as a set of necessary and sufficient conditions. It becomes something very different, something that Wittgenstein called 'family resemblances'. Now I was looking at the word 'community' and looking for definitions of community, one of the posts, or one of the definitions that I read was, "Well, community is like pornography. I don't know what it is but I recognize it when I see it." And it's that sort of sense of meaning inherent in a word, in a post, and indeed, in a person.

Two ways of looking at the world.

Because there are two ways of looking at the world. One way is to look at the world from the point of view of words. And you try to describe things. Another way of looking at the world is to look at the patterns. And try to see what emerges out of them. If you look at the diagram there, that little messy bit of lines and dots is a concept. Could be any concept, could be a blog post, could be the word 'Paris', could be your self-identity. Now if you use words, you cut through that cluster like a knife and you get a one-dimensional partial representation, you get an abstraction, but if you look at it from the point of view of patterns, then the meaning of that concept emerges from that cluster of entities and relations.

Now, emergence is a hard concept. And I'm not going to be able to deal with it properly here. So I'll just give you the quick example and admit that I'm fudging it. Emergence is like when you recognize Richard Nixon on your television set. Now Richard Nixon is not really on your television set, obviously. In fact, what's on your television set is a whole bunch of little dots. But the thing is, those dots are organized in such a way that when you look at the television set you recognize that organization of dots as being similar in form to Richard Nixon. And indeed, for people like me, I've never met Richard Nixon, that's the only understanding of Richard Nixon that I have, is through this repeated pattern or organization of discrete entities.

Now what's important here is that the particular dots don't matter, the particular colour and the particular properties of the dots don't matter. Richard Nixon is not in the pixels. Richard Nixon is in the organization of the pixels. And so we say the image of Richard Nixon is emergent from the pixels. Now what's important here, in my mind, although it's a little bit periphrial, is, this doesn't happen without a perceiver, without the capacity to recognize this pattern as being Richard Nixon. Take somebody who has just been born recently, wasn't around during the 70s, doesn't know why scandals always have the word 'gate' attached to them, show them a picture of Richard Nixon and "yeah, some guy. He's got a bit of a sweaty upper brow. But I don't know who it is." You have to have a context in which to recognize a pattern in a network.

When we use words, that warps it, because we're going after the big spike, and words actually distort because they pull the pattern into themselves, and people start thinking, 'well the word is the concept,' and 'the concept is the word.' Of course it isn't, but because we're focused on this big spike and because the meaning of the concept is being derived from the word, the meaning becomes the word.

The meaning of a post.

If we think of meaning as use then what is the meaning of a blog post? What does a blog post talk about? It's not contained in the post. Rather, it's contained in the network of relations in which the post finds itself. In the referrers. In the use. In the connections with other things. In evaluations of the post. A whole variety of different connections, different relations, are possible which could, and in my opinion will, be used to characterize an individual post.

So if we look at the two pictures of meaning of posts, on the one picture, if we think of meaning as inherent in the post and maybe describable in words we get an organization of meanings that looks very much like the network that's formed through random connections, because the word, when attached to a concept, a post, is more or less random. I was looking on the Northern Voice website and they said, "When you're tagging this, please use..." and then they give you your string. They could have used any string. They use, I forget what they used, 'northernvoice' or whatever. But they could have used 'qxdytz'. That would have worked just as well. It's random. And you end up with clustering that looks just like one of these scale free networks. But if meaning is thought of as distributed, as being derived from the relations and not just the content of the word, then you get a very different looking network, a very different pattern.

Now why does this matter? It matters this way. If we're deriving meaning and connections and communities in a random fashion everything flows from the big spike. Scoble was up here, saying, "My friend was saying, I want you to link to me." And, he said, "That's not how it works. Create something of value," he said. Right? "And I will decide whether it's worth linking to." That's the big spike telling the long tail what to do. Isn't it? That's what happens when meaning derives from the centre. And if you push it, that sort of organization and arrangement requires control. Look at Technorati Tags. Now, we've already gotten some tag spam, and we've already gotten some structured vocabulary in Technorati Tags, and eventually somebody will come out and propose and ontology of Technorati Tags, a taxonomy, and they will say, "Everyone should do it this way." And anyone who doesn't, well, they're being chaotic, they're being disruptive.

But if the idea emerges from the pattern of connections between individuals there's no one in control. Scoble can't tell me what to twrite in my blog and it doesn't matter whether he links to me or I link to him. And the dynamics in such a network are completely different. This works if you have freedom. This works if nobody tells you how to tag. This creates order and relevance and meaning through diversity, not conformity. Two very different pictures of community.

So how do we pull this off? How do we kill the big spike? How do we transform tagging from something that people can use to spam to something that can actually get us to the point where we have meaningful communities?

Well we come back to online learning. Because again, that's what I know about. And in online learning what's happening is -- and it's very slow and there's a lot of resistance because people who are part of the big spike don't want to let go, right, and the people in the online learning world who are part of the big spike are university presidents, they're publishers, and authors, the top researchers in the field, whoever, and they don't want to let go, in a classroom, if you have the teacher, that's the big spike, and you have all the little students there, the long tail -- but what's happening in online learning, very slowly, very reluctantly is a shift from centralized place-based networks into something more distributed. And we're getting to the point where learning resources are available not from a given place, not from a given authority, but from out there on the network. And what we're after, at least some of us, those of us who are in the long tail, what we're after is a way of being able to recognize - and something that doesn't require tagging six million items - the posts, the resources, that are salient to us, as individuals.

Now, people don't get that in the online world, and I don't think they get that in social networking, and so we always talk about, "Look, we got to standardize, we got to standardize, it's the only way the system will work is standardized," and I go to online learning conferences and I tell them, "Well, the most popular form of XML in the world today is RSS, there is no standard, in fact there are nine or so different varieties, according to Mark Pilgrim, and who knows what there will be tomorrow? But that's the thing that's working."

Educational communities the old way, nice neat topics and classes and so on, but this type of structure both in schools and in the blogosphere, where you have the flow coming from the top, is ripe for abuse. There's another one from J.D. Lasica just came out today, about "Influence Peddling in the Blogosphere". And of course we heard mention earlier of Raging Cow and the Lincoln Fries. Eventually these companies are going to get good at this. Right now they're screamingly bad. But they are eventually going to become good. 43 Things had the entire blogosphere fooled for a couple of weeks. And, and it sort of fell apart. Eventually there will be things that don't fall apart. I look at the Wall Street Journal opinion columns, and they are defining from the top down. There's a whole bunch of people out there who echo the words that they see in these opinion columns. They don't know what they mean, because there is no context. They're just echoing the words. And it just becomes a way for the Wall Street Journal to broadcast.

Future learning environments place the individual at the centre - that's where it says 'Future VLE' - and a range of resources that they bring in, or that they aggregate, from a wide variety of different sources. Notice he has 43 Things on there. That actually places that diagram at a precise moment in history. And if you look at community in this picture, then you're able to draw out a theory of community, where a community is defined by three major components. First, as a means of organizing input and experience. Second, as a means of putting that experience into context. What does it mean to you here now? And then third, and very importantly, as a means of taking what you've done, what you've remixed, what you're repurposed, and putting it out there so it can become part of someone else's meaning. Just imagine how the copyright barons look at this model of organization, right? Community is antithetical to copyright, and conversely.

The idea here is that the community is defined as the relations between the members where the relations have semantical value, where that semantical value is defined by the relations. And I know it sounds like bootstrapping, but we've been doing that throughout history. People exist in relations to other people, to things, to resources, even to spaces.

So how do we pull this off? We can't just blast four million blogs, eight quadrillion blog posts, out there, and hope Technorati will do the job, because Technorati won't do the job, because Technorati represents the whole four million things and I'm not interested in three million nine hundred and ninety-nine of those. What has to happen is this mass of posts has to self-organize in some way. Which means there has to be a process of filtering. But filtering that is not just random. And filtering that isn't like spam blocking. Filtering has to be a mechanism of determining what it is we want, because it's a lot easier to determine what we want than what we don't want.

So how do we do this? We create a representation of the connections between people and the connections between resources. The first pass at this I described in a paper a couple of years ago called "The Semantic Social Network" and the idea, very simply, is we actually attach author information to RSS about blog posts. It kills me that this hasn't happened. Because this is a huge source of information. And all you need to do is, in the 'item', in, say, the 'dc:creator' tag, put a link to a FOAF file. And all of a sudden we've connected people with resources, people with each other and therefore, resources with each other. And that gives me a mechanism for finding resources that is not based on taxonomies, is not based on existing knowledge and existing patterns, but is based on my placement within a community of like-minded individuals. Now Instapundit stuff probably isn't going to filter through to that, but really cool stuff, like Dave Pollard stuff, will.

Now that semantic social network is just a first pass at this. We want to create these connections on many levels. And so what we want is metadata, not simply created by the author of a post, but created by readers of posts. This is what I call 'third party metadata'. Third party metadata -- we're beginning to see some of this out there in the blogosphere, in a small, limited and usually site-based way, right? Links, references, readings, annotations, classifications, context of use. But it can't be site-based. Because that doesn't create a network. It might as well be random.

Now we've talked about this in the field of learning resources, because professors love ratings, but we could also do this in the blog world, with RSS. And it's very simple to do. You just create a tag, that looks just like any other 'item' tag, but you're not the author of that item, and you identify it in some way, usually through a link, and then you add your third party metadata. This is - the 'SSN' stands for 'Semantic Social Network', I made it up, 'commentary' is the type of third party metadata, I made it up, and then, who wrote it, and what they had to say. And that becomes third party metadata. It becomes information about the resource.

Now the way this should work, and the way I've proposed for this to work in the educatiuonal community, is that as much of this third party metadata as possible is created through automatic means. Now annotations aren't going to be created automatically. But a context of use will, right? If I look at a resource while I'm taking a physics class then the context of use of that resource is 'in physics'. And so I know, even if the resource is, like, a picture of a rabbit, I know that that picture is related to that subject, because I looked at that picture in that class. And the system that I'm using to look at that picture should note that, and log it. Now what's relevant? I looked at that picture. Now that attaches everything that anyone knows about me to that picture. And so we get enormously rich descriptions through very simple mechanisms of automatic classification.

My contention is that instead of the spike-based power-law-based Instapundit-based network, that when we get something like the semantic social network, and we will get something like the semantic social network, because it's very simple to do, patterns of organization will be created. In the field of neural networks and connectionism they tyem 'clusters', you get a cluster phenomenon where we're not creating communities around a specific word, or specific concept, but the community itself emerges as being created by and defined as that particularly dense set of connections.

I've set up a system called Edu_RSS which is a very primitive first pass at this, and the idea here, Edu_RSS is an aggregator, there should be many instances of Edu_RSS, in the ideal world everybody would have something like this on their desktop, and it pulls in the link metadata, but it also pulls in rating metadata, and it doesn't pull it in from the entire world, the way Technorati does or the way Blogdex does, it pulls it in from my community, my network of friends. And if you set up the network in this way you can actually stop worrying about searching, because the network itself becomes the search where you go through layers of linking and so what comes out the other end is stuff that will be of interest to you. And if you're finely grained enough at the output end then you can get a very precise set of inputs. But the thing is, this set of inputs comes from the entire blogosphere of four million people rather than the randomly chosen top one hundred.

The community is the network. There is no centralized place that constitutes community, there are only people, and resources, that are distributed, that are all acting on their own behalf and in their own interests - if you ever read Marvin Minsky's "The Society of Mind", it's like that - where the network consists of a set of self-selected relations using a variety of contextual information, that I've defined as third party metadata, to establish meaning, and where this meaning not only defines the community but emerges from the community.

And that's probably all of my time and I thank you very much for your patience.