MBD: Semantic Wikis

MBD Home

Overview

Concepts

Modeling

Semantic Wikis

  Background
  So, add structure!
  Some Examples

The Extraction Phase

The Analysis Phase

The Presentation Phase

Case Study (FSW)

Case Study (Unix)

Advice

Tools

Books


Rich Morin, rdm@cfcl.com

Printable Version

Semantic Wikis show promise for creating models and presenting information. Ideally, they would combine a wiki's convenience and freedom with the strengths of semantically-aware (e.g., ontology-based) systems. This could allow a graceful merging of human-edited and mechanically generated content.

Background

Wikis provide a very convenient means for generating informal documentation. They are easy to edit, support both individual and collaborative efforts, and scale extremely well.

Wiki links can be created by the simple act of typing in a CamelCase word. If a page doesn't already exist, the act of clicking on its link will create one. Simplified markup languages are also available, easing the process of page creation.

Most wikis have rollback mechanisms, allowing contributions or changes to be removed, merged with previous content, etc. This allows public wikis such as Wikipedia to reach and retain a high level of quality, despite occasional misguided postings.

The web's basic architecture reduces the apparent complexity of web site (and thus wiki) generation. Although collections of pages and links form a graph-based data structure, few users think about this fact. Looking at any given page, the user sees only content and links; the global structure can be (and usually is) ignored.

As sites such as Wikipedia demonstrate, wikis can take full advantage of these simplifications. Their users navigate through enormous graphs, seeing only pages that appear to be of interest.

Web Limitations

HTML links (e.g., <A HREF="...">...</A>) have limitations that interfere with their use in semantic wikis. Specifically, they are uni-directional, untyped, and binary. The first problem can be worked around fairly easily, but the others require structural changes.

Because a link only goes to a given page, the entire graph must be traversed in order to find backlinks (links that come from other pages). For search engines such as Google, this can be a massive problem, because the "graph" in question is the entire web.

Few wikis bother to track backlinks, even though the problem is much more tractable for them. Even fewer can display clickable context diagrams, showing a page's "local neighborhood". Pimki (an experimental "Personal Information Management" wiki) does both, but it is a conspicuous exception.

Even Pimki, however, is constrained by HTML links' other limitations. Although a link can have many attributes, most only contain the target URL and some text content to be highlighted and displayed. Nothing, in any case, indicates which links are of what "type".

Without typed links (e.g., Is_A, Has_A, Used_By), Pimki has very little information to work with. It cannot, for example, filter by link type or assess the "strength" of given links, much less make deductions (e.g., inherited characteristics) based on link types.

Finally, HTML links are "binary", in the sense that they only connect two pages. This isn't a catastrophic problem: Resource Description Framework (RDF) is also based on binary links. However, many users may find it easier to say "John is taking the plane to Chicago" than to specify the equivalent set of binary relationships.

So, add structure!

Semantic wikis address the "type" problem head-on, allowing pages and/or links to have specified types. Thus, we might say that the /etc/passwd page deals with an instance of the class Control_File. With this information, the wiki can generate bi-directional links, display summary or inherited information, etc.

Similarly, if we say that a Control_File may be written by a Program (or really, by a Process that is running the Program), the wiki knows that it's OK for a user to assert that the /etc/passwd file may be written by the /bin/passwd program.

The final limitation (binary relationships) is addressed by some semantic wikis, but not by others. In fact, some would contend that this limitation is beneficial, in that it simplifies the specification of relationships. However, the complexity re-asserts itself by greatly expanding the number of relationships required.

Some Examples

Here are more than a dozen examples of semantic wikis, including (where available) brief summaries. For an up-to-date list, and other useful information, be sure to visit the Semantic Wiki State Of The Art page on the Semantic Wiki Interest Group web site.

Note: The "OWW" link following some entries is not an exclamation, merely a link to the corresponding page in the OntoWorld Wiki.

  • COW (Combining Ontologies with Wikis) is written in Java, using the Java Development Kit (JDK). It uses Apache Tomcat as a web container and either Microsoft SQL Server (MSSQL) or PostgreSQL as a database.

    COW provides explicit capabilities for defining concepts and instances, making queries, etc. The COW and mWiki source code is available for download, with no apparent copyright or license notices. (OWW)

  • Gedankenspiel ("thought play") is written in Common Lisp. Every page maps to a "concept", which may be a category, property, etc. The program's distribution status is unknown. (OWW )

  • IkeWiki is written in Java, using the Java Development Kit (JDK). It uses Apache Tomcat as a web container and PostgreSQL as a database. IkeWiki is available under the GNU Lesser General Public License (LGPL). (OWW)

  • KawaWiki uses RDF/OWL data, as described by a (Japanese-language) paper. The program is being written in PHP, using vOWLidator and RAP (RDF API for PHP). The developers plan to make the system publicly available through the Internet. (OWW)

  • Kendra Base is written in Python. It uses PostgreSQL as a database. The wiki is being developed to support the needs of the KendraProject. Although no specific license has been chosen yet, the expectation is that the software will be Open Source. (OWW)

  • OntoWiki is described in the speculative paper OntoWiki: Community-driven Ontology Engineering and Ontology Usage based on Wikis. (OWW)

  • PlatypusWiki is written in Java; Pytypus, a re-factored version, is under development in Python. (OWW)

  • pOWL is written in PHP, using an ADOdb-compliant database. pOWL is available under the GNU General Public License. (OWW)

  • Rhizome is written in Python, using the 4Suite XML and RDF libraries and, optionally, the Lupy, Redland, or RDFlib data store. Rhizome is available under the GNU General Public License. (OWW )

  • Rise (Reuse In Software Engineering) is written in Java;, using JavaServer Pages (JSP) and Java 2 Platform, Enterprise Edition (J2EE). Rise is currently proprietary, but parts of the software will be released as Open Source. (OWW)

  • Semantic MediaWiki (Japan) is written in PHP as a MediaWiki extension. (OWW)

  • Semantic MediaWiki (WikiProject) (aka SeMediaWiki) is written in PHP as a MediaWiki module. It is available under the GNU General Public License. (OWW)

    A paper ("Semantic Wikipedia") was submitted about this project. (OWW)

  • SemperWiki (Semantic Personal Wiki) is written in Ruby, using the Gimp Toolkit (GTK+) and the Redland RDF Application Framework. (OWW)

    Because SemperWiki runs on the user's machine, it can take advantage of local information (e.g., looking up or making annotations on local files). In addition, the use of GTK+ offers a more powerful GUI than a web browser can provide.

  • SemWiki (Semantic Wiki) is written in Java, using Apache Tomcat as a web container. (OWW)

  • SweetWiki (Semantic WEb Enabled Technology Wiki) is based on the Corese Semantic Web Factory, a semantic web search engine based on Conceptual Graphs. (OWW)

  • WikiOnt (Wiki Ontology) is an ontology (i.e., vocabulary specification) for Wikipedia articles. (OWW)

  • WikSAR (Wiki Semantic Authoring and Retrieval) is written in Perl. Distribution and licensing status are unknown. (OWW)

Commentary

Nearly all of the wikis listed above are based on the RDF notion of (subject, predicate, object) expressions, known as "triples". In each link, the current and target pages are used, respectively, as the subject and object of the triple. The predicate is then added by means of a "type" indicator, such as:

[[type:IsWrittenBy /bin/passwd]]

This is convenient and flexible (and thus in keeping with the spirit of wikis), but it has both structural and human interface problems. Ontiki, my (speculative) design for an "ontology-aware wiki", attempts to address these issues.

There is no concensus as to what semantic wikis should do, let alone how they should do it. A "category killer" may emerge in a few years, but even this is not certain: none has for basic wikis. Nonetheless, semantic wikis are fun to play with and can be used to solve problems that other tools do not. So, enjoy...

Next: Extraction