Knowledge Base Integration

This article is the result of some reflections on an online learning portal being set up by the good people at Duncans MindLeaders On-Line Learning. My reaction isn't so much a response to their effort as it is a wider reflection on what it takes to use knowledge bases effectively.


I notice they place links into categories, though I couldn't find a portal-link page where the categories are displayed. But that said...

I've been wrangling with categories for some time now. I used to set my knowledge base such that each link was associated with a category (it was actually a field in the link table). Then I wanted links to fit into multiple categories so I created a 'lookup' table (or 'link list').

When I submitted a link, I would also select a category for it (from a frop-down list). This ceased to be practical once I got up to about 250 categories. I tried various ways to generate category selection and finally settled on 'automatic' categorization, whereby the contents of the submission would be scanned for a regular expression and the category (categories) based on matches.

I've concluded after several years of this that none of this is worth the effort. Categories are too fluid - something that was a subcategory last year ought to be a category this year, and all that really depends on your point of view anyways. Also, any meaningful categorization schema is going to have hundreds - even thousands - of entries, and so it's as hard to find the category you want as it is to find the actual entry.

So: I redefined what I mean by a category. Now, what I think of as a category is a 'pre-defined search'. This allows me to add, delete, amend, reconfigure, etc., categories as much as I want without worrying about the integrity of the entire system. Cross categorization, which used to be a big headache, is now simple. And I never worry about assigning an entry to a category.


When I looked at their KB I followed their suggestions and saw the link, attributed to me, that you had placed into the system. Which led me to the question: how did it get there?

I ask this question from a technical point of view, not a content point of view (you can use any of my links that you wish). The short story is, there's two ways it could get there: manually (via eg. cut and paste from the newsletter) or automatically (via., eg., one of my data feeds, such as the RSS version of the newsletter).

My Knowledge Base runs on a custom-built relational database authored in perl (I didn't like what was out there so I wrote my own relational DB software). It runs on standard RDB principles so in theory your DB could read my my DB produces and vice versa. Indeed, that's the whole point of having multiple versions of the link DB, and expecially the XML version (currently only an RSS feed, but planned to be many more).

My KB also has a subroutine I call 'grasshopper' that is essentially what people call a 'scaper' or 'aggregator' - it contacts external sites, reads the HTML or RSS, brings back the data, formats it, filters it (according to whether it matches any of my categories, where categories are, recall, pre-defined searches) and tosses it into (a preview area of) my KB.

Seems to me that this is a lot easier than cut-and-paste, and indeed, if you set up a 'what's new' view of your knowledge base, I will scan it regularly and pull the info. Even better if you provide an RSS view.

Distributed DBs

Now we can actually take this even a step further and ask: why is my link in their KB at all? Again, this is a purely technical question.

I actually have several instances of my KB software running for different projects. I am trying to set it up so that you can define searches across these independently maintained KBs. Thus, eg., If I search for 'LCMS', it will scan my KB, then yours, then any of a number of related KBs to retrieve the data.

Then there's no need for you to input my entry into your KB at all. You only need a means of accessing it. Only if you want to do something over and above what I do (eg., add the rating system) would you want to store it on your own system.


When I looked at their version of the entry I 'contributed' what I saw was a word-for-word replication of what I had written. Now that is probably an artifact of your demo for me, but it raised some questions.

When I say 'value-add' what I mean is the addition of reflection and contextualization. Thus, for example, when I add an item to my KB I will provide an assessment of the resource (is it 'light'? 'detailed'? 'authoritative'?). But more, I will discuss what role this resource plays in a larger picture. Does it contribute to the ongoing DCMA debate, for example? Does it add to our understanding of LCMS theory? And finally, I often indicate whether or not I agree or disagree with the content of the resource, whether I have a quibble, whether I think they've made some point that needs highlighting or refuting.

This is the 'value-add' and I think it provides much more information than a simple ranking. But also: because my search works over the contents of the listing (as opposed to the contents of the resource being listed), it creates better search results. Thus if I say a certain resource that talks about how to write good code can also be used to evaluate good instructional design, the resource will show up in a search for 'instructional design' even though it never actually talks about it. Thus my search is, itself, a form of value-add.


Their KB looks like most KBs in that it is a stand-alone project. At least, that's how it appears. But I think that people don't go to KBs to do searches - or more accurately, there are many more useful ways to use a KB than to have people go to it and search.

My own KB engine extracts and integrates lists of resources (or even single resources) into web pages, for example. Now this is no great invention - Cold Fusion and ASP have done that for years. But it is an important use: it means you can create a relevant up-to-date list of resources on any web page.

I use this to create my newsletter. My newsletter page is simply a command to extract all the links from a give project that have been submitted in the last 20 hours (my weekly newsletter is exactly the same except that it's all the links from the last 120 hours).

The big weakness of Cold Fusion and ASP, though, is that these lists of resources can only be placed on a page sitting on the same server as the database (or that has direct access to it, in the case of a networked server environment). I could not, for example, place your resources on my home page.

But you could place my resources on your home page. You could specify exactly what you want to see ('the last 5 resources that mention the word 'grasshopper', say). The idea here is that my KB can be used by any person on any page. It could thus bem, for example, integrated into an online course that uses WebCT.

The key idea here is to move the information from the KB to the remote location where it is actually needed. People should (almost) never have to go to the KB - the KB goes to them.


I haven't built this in yet but I'm going to. It seems to me that any resource ought to be able to spawn a 'conversation'. People should be able to comment on the resource; these comments in turn become resources in their own right, feeding back to the original resource.

Share |
Views Today: 6 Total: 172.