MBD: Modeling

MBD Home

Overview

Concepts

Modeling

  Knowledge Representation
  Methods and Tools
  Caveats

Semantic Wikis

The Extraction Phase

The Analysis Phase

The Presentation Phase

Case Study (FSW)

Case Study (Unix)

Advice

Tools

Books


Rich Morin, rdm@cfcl.com

Printable Version

Modeling efforts can usefully range from informal sketches to formal, notation-heavy specifications. My preference is to keep things informal and flexible until the need for more formality becomes evident. This is in line with some of the ideas of the Agile software development movement.

Diagrams are very useful for showing connectivity (e.g., control and data flow, method usage). If all you're trying to do is capture the basic structure, use any icons (e.g., boxes, circles) that seem comfortable. Picking a consistent notation becomes important, however, if the reader is separated by time or distance.

Similarly, although paper and whiteboards work well for small diagrams and informal meetings, diagram generation tools (e.g., Dia, OmniGraffle, Visio ) also have their place. Aside from handling presentation details (e.g., icon and arrow styles, fonts, layout), these tools manage connectivity constraints, etc.

As the size and complexity of the model grows, the need for even more structure and support will become evident:

  • Is the notation defined and documented, so that others can find out what each graphic detail implies?

  • Does the tool allow arbitrary comments and metadata?

  • Can you work with a subset or view of the system?

  • Can the tool examine the model, finding deficiencies and reporting on its overall structure?

  • Can the tool draw conclusions from the model?

Just as general-purpose drawing tools cannot support architectural, electronic, or mechanical design, diagram generation tools cannot analyze the diagrams they record.

Knowledge Representation

Tools for generating and analyzing conceptual schemas must be able to represent and manage knowledge about the system under study. Here is an elegant description of knowledge representation, taken from John F. Sowa's excellent book on the topic:

    Knowledge representation is a multidisciplinary subject that applies theories and techniques from three other fields:

    1. Logic provides the formal structure and rules of inference.

    2. Ontology defines the kinds of things that exist in the application domain.

    3. Computation supports the applications that distinguish knowledge representation from pure philosophy.

    Without logic, a knowledge representation is vague, with no criteria for determining whether statements are redundant or contradictory. Without ontology, the terms and symbols are ill-defined, confused, and confusing. And without computable models, the logic and ontology cannot be implemented in computer programs. Knowledge representation is the application of logic and ontology to the task of constructing computable models for some domain.

Most of the modeling methods and tools discussed below are aimed at assisting with ontology development. They keep track of the definitions of classes, instances, relationships, etc. If our goal were to create an expert system, both the ontology and the logic rules would need to be extremely detailed and precise.

In MBD, however, most of the reasoning will be done by humans. The developer will look over the ontology and decide what to present. The user will look over the presented material and decide which parts are currently of interest. As long as the material is plausibly interesting, the user is unlikely to complain.

So, these approaches and tools often strive for more detail and precision than MBD requires. Don't get caught up in the details; your main objective is to produce a useful but simplified model!

Also note that knowledge representation is an active research area. There are many approaches and theories, a few emerging standards, and little interoperability between existing tools.

Methods and Tools

Assorted communities (e.g., AI, DBMS, Semantic Web) have developed methods and tools for knowledge representation. Several of these approaches appear to be quite applicable to MBD, but I have yet to find a category killer. Before we look at the available offerings, let's consider the general characteristics we're looking for.

Organization

The most critical characteristic, from my perspective, is the model's fundamental organization. The components of a system can have arbitrary relationships; the model must be able to encode these, allow the user to traverse them, etc.

Because the relationships are arbitrary and (initially) unknown, the approach must not restrict the modeler to, say, a list-based or even hierarchical organization. Consequently, most outliners and mind mapping tools aren't suitable.

Because HTML links can only be traversed in one direction, most HTML-based approaches (e.g., typical wikis are also unsuitable. (Pairs of links can be used for bi-directional relationships, but this is tedious and error-prone, if done manually).

In addition, the model should allow the user to interact with assorted subsets and views of the system. For these and other reasons, I believe that the model must be based on a fully-traversable (and very extensible) graph-based organization.

Diagramming Format

A modeling tool must allow the user to interact with (e.g., view, navigate, edit) the model. Although most of the detailed information will be textual in nature, text is a poor medium for presenting relationships. So, most modeling tools use some sort of diagramming format.

The design of this format is both critical and challenging. If the format is too simple, it won't be able to convey the needed information. If it is too complex, the user will become confused and frustrated. Ideally, the tool should allow the user to use simplified notation, adding details as desired.

Interchange Format

Modeling tools should have reliable and convenient ways to exchange information with other tools. Unfortunately, this is seldom going to be the case for existing tools. There are many formats for encoding conceptual models, varying at syntactic, structural, and semantic levels.

Efforts are being made to provide paths between these formats. For example, the International Organization for Standardization (ISO) has a working group which recently proposed a standard:

    Common Logic (CL) is an information exchange and transmission language, based on first-order logic. The CL definition allows a variety of different syntactic forms, called "dialects". A dialect may use any desired syntax or structure, but it must be equivalent to the abstract syntax of Common Logic (and thus, to any other CL dialect), in terms of its semantics.

    - Common Logic entry, in Wikipedia

The World Wide Web Consortium (W3C) is working on a related, though less formal standard:

    SKOS Core is a model and an RDF vocabulary for expressing the basic structure and content of concept schemes such as thesauri, classification schemes, subject heading lists, taxonomies, 'folksonomies', other types of controlled vocabulary, and also concept schemes embedded in glossaries and terminologies.

    The SKOS Core Vocabulary is an application of the Resource Description Framework (RDF), that can be used to express a concept scheme as an RDF graph. Using RDF allows data to be linked to and/or merged with other data, enabling data sources to be distributed across the web, but still be meaningfully composed and integrated.

    - SKOS Core Vocabulary Specification

Although these efforts are proceeding well, no adopted standards appear to be imminent. In the meanwhile, although one can hope for a documented interchange format, about all that one can reasonably require is a readable (e.g., XML) file format. Now, let's look at some of the available offerings...

Concept Maps

For informal modeling, I would suggest the use of concept maps. They aren't overloaded with notation, but they have enough structure to capture the basic entities and relationships of a system. Unlike mind maps, concept maps aren't restricted to hierarchies.

Dr. John Sowa has written a nice overview of Concept Mapping, covering concept maps, conceptual graphs, topic maps, etc. The classic reference on concept maps is Learning How to Learn.

Data(base) Modeling

The database community uses assorted variations on entity-relationship diagrams (ERDs). Unfortunately, these can cause the modeler to focus on low-level (e.g., database-related) issues, rather than high-level concepts.

In addition, ERDs can run into difficulties when a relationship needs to be treated as an entity. If we say "Romeo loves Juliet", how do we discuss the different meanings of "loves"? Modeling Methodologies is a good introduction to some of these issues.

Object-Role Modeling (ORM2) appears to handle these issues nicely, at some cost in notational complexity. I don't know of any Open Source ORM2 tools (though one is promised for 2006), but some gratis and inexpensive tools have emerged. Some versions of Microsoft's Visio handle various aspects of ORM2 diagrams. For more information on ORM2, visit www.orm.net.

Unified Modeling Language (UML) Class Diagrams are also complex, but they may appeal to programmers who are already familiar with this notation. UML is very well documented and many supporting tools are available for it.

Knowledge Engineering

The Expert Systems community has been working with problems of Knowledge Engineering for several decades. Not surprisingly, they have some useful tools to offer.

I'm particularly interested in Protégé. As described in An AI tool for the real world, Protégé is an Open Source, well-supported, standards-friendly tool for creating models, ontologies, etc.

Conceptual Graphs (CGs) are structurally similar to ORM2 diagrams, but they are based on a form of predicate calculus known as first-order logic (FOL). So, they are a good match for expert system technology.

It's quite likely that Protégé could be augmented to support CGs, ORM2, or other diagramming notations. This might ease the recognition and specification of complex sets of relationships.

Semantic Web

The Semantic Web community is developing standards (e.g., Resource Description Framework, Topic Maps) for describing concepts, encoding document metadata, etc. The standards are still "works in progress", but they are worth watching because of their large and active developer communities.

Resource Description Framework (RDF) is based on sets of three-part declarations (i.e., "triples"): subject, predicate, object. The apparent simplicity of this approach is balanced by the need to create large numbers of triples when complex concepts need to be expressed.

Topic Maps use a much richer vocabulary, including terms such as association, name, occurrance, scope, topic, etc. This allows relatively small expressions to express complex concepts, conditional assertions, etc. A gratis tool (Ontopia Omnigator) is available; Open Source tools are under development.

Caveats

Although the use of models is central to MBD, modeling is a tool, rather than a goal. If you develop a crystal-clear model, but generate no documentation, you haven't really accomplished your objective.

So, curb your desire to generate the "perfect" model. Instead, try to generate a useful and flexible model, improving it as you proceed. As you work with the model, you'll find areas that could use clarification, expansion, etc. Your modeling approach should make it easy to make these changes.

Also, avoid the temptation to "start at the bottom", detailing every data item, field, etc. This sort of information can be researched as it is needed, but it doesn't serve the general purposes of the model: finding useful information, understanding the system, etc.

Next: Semantic Wikis