3 The OIL Language

3.1 An informal description of OIL

In this section we will give an informal description of the OIL language. An example is provided in Section 3.2 and a formal specification and semantics (both of the language and of the common inference problems) will be given in Appendix C. To improve readability we will use a more compact pseudo XML syntax where opening tags are indicated by bold faced text, grouping of sub-content is indicated by indentation and closing tags are omitted.

An OIL ontology is a structure made up of several components, some of which may themselves be structures, some of which are optional, and some of which may be repeated. We will write component ? to indicate an optional component, component + to indicate a component that may be repeated one or more times (i.e., that must occur at least once) and component * to indicate a component that may be repeated zero or more times (i.e., that may be completely omitted).

When describing ontologies in OIL we have to distinguish three different layers:

3.1.1 Ontology Container

    We adopt the components as defined by the Dublin Core Metadata Element Set, Version 1.1 for the ontology container part of OIL. Although every element in the Dublin Core set is optional and repeatable, in OIL some elements are required or have a predefined value. Required elements are written as element + . Some of the elements can be specialized with a qualifier which refines the meaning of that element. In our shorthand notation we will write element.qualifier. The precise syntax based on RDF is given in [Miller et al., 1999] , and in the appendix. Here we provide our pseudo-XML syntax explained above.


    • title + The name of the ontology, e.g., "African animals".

    • creator + The name of an agent (i.e., a person, a group of persons, or a software agent) that created the ontology.

    • subject * Keywords or classification code describing the subject the ontology deals with.

    • description Natural language text describing the content of the ontology, e.g., "A didactic example ontology describing African animals". Besides this description, there is one special description element required, which has the release qualifier:
      • description.release The version of the ontology (a number), e.g, 1.01.
    • publisher * Defining the entity that is responsible for making the resource available.

    • contributor * The name of an agent (i.e., a person, a group of persons, or a software agent) that helped to create the ontology.

    • date * The date the ontology has been created, modified, or made available (see ISO 8601 for format instructions).

    • type + The nature of the resource. A predefined and required value is ontology , although this value is not yet in the Working Draft of the resource types [Guenther, 1999] .

    • format * The digital manifestation of the resource, recommended as a value is the MIME type of the resource, i.e. "text/xml".

    • identifier + The URI of the ontology.

    • source * Optional references (URI) to sources from which the ontology is derived. E.g., a reference to a plain text description of the domain on which the ontology is based.

    • language + The language of the ontology. Obviously, one predefined and required value is "OIL". Other elements can contain the language of the content of the ontology, according to RFC 1766.

    • relation * A list of references to other OIL ontologies. It is recommended to list all ontologies that are imported in the definition section with a hasPart qualifier. Other possible and meaningful qualifiers are replaces, isReplacedBy, requires and isRequiredBy. For example, to list an imported ontology, we write: relation.hasPart "http://www.ontosRus.com/animals/jungle.onto".

    • rights * Information about rights held in and over the ontology.

     

3.1.2 An Ontology specified in OIL

    Apart from various header fields encapsulated in its container, an OIL ontology consists of a set of definitions:
    A class definition (class-def) associates a class name with a class description. A class-def consists of the following components:
    A class-expression can be either a class name, a slot-constraint , or a boolean combination of class expressions using the operators AND , OR or NOT . The structure of these boolean combinations is as follows:

Note that class expressions are recursively defined, so that arbitrarily complex expressions can be formed. For example

NOT (Meat OR Fish)

defines the class whose instances are all those individuals that are not an instances of either the class Meat or the class Fish.

    A slot-constraint (a slot may also be called a role or an attribute ) is a list of one or more constraints (restrictions) applied to a slot. A slot is a binary relation (i.e., its instances are pairs of individuals), but a slot-constraint is actually a class definition -- its instances are those individuals that satisfy the constraint(s). For example, if the pair (Leo, Willie) is an instance of the slot eats, Leo is an instance of the class lion and Willie is an instance of the class wildebeest, then Leo is also an instance of the value constraint wildebeest applied to the slot eats. A slot-constraint consists of the following components:

    A slot definition (slot- def ) associates a slot name with a slot description. A slot description specifies global constraints that apply to the slot relation, for example that it is a transitive relation. A slot-def consists of the following components:

3.2 An example OIL ontology

Some points to note in the above ontology are:

3.3 Tools

One of the major benefits of OIL is that it comes with a range of tools to support ontology design, exchange, integration and verification. In particular, it is possible to use the FaCT reasoner to check the consistency of all the class definitions in an ontology, and to discover sub-class/super-class (subsumption) relations that are implied by the definitions in the ontology but not explicitly stated. FaCT ( Fa st C lassification of T erminologies) is a Description Logic (DL) classifier that can also be used for consistency checking in modal and other similar logics. The FaCT system includes two reasoners, one for the logic SHF and the other for the logic SHIQ , both of which use optimized implementations of sound and complete tableaux algorithms. FaCT's most interesting features are its expressive logic (in particular the SHIQ reasoner), its optimized tableaux implementation (which has now become the standard for DL systems), and its CORBA based client-server architecture.

3.3.1 Background

The logic implemented in FaCT is based on ALC R +, an extension of ALC to include transitive roles [Sattler, 1996] . For compactness, this logic has be called S (due to its relationship with the proposition multi-modal logic S4 (m) [Schild, 1991a] ). SHF extends S with a hierarchy of roles and functional roles (attributes), while SHIQ adds inverse roles and fully qualified number restrictions. The SHIQ reasoner is of particular interest, both from a theoretical and a practical viewpoint. Adding inverse roles to SHF (resulting in SHIF ) leads to the loss of the finite model property, and this has necessitated the development of a more sophisticated double dynamic blocking strategy that allows the algorithm to find finite representations of infinite models while still guaranteeing termination [Horrocks & Sattler, 1999] . Moreover, when SHIF is generalized to SHIQ , it is necessary to restrict the use of transitive roles in number restrictions in order to maintain decidability [Horrocks et al., 1999] . SHIQ is also of great practical interest as it is powerful enough to encode the logic DLR , and can thus be used for reasoning about a wide range of conceptual data models, e.g., Extended Entity-Relationship (EER) schemas [Calvanese et al., 1998a].

3.3.2 Implementation

FaCT is implemented in Common Lisp, and has been run successfully with several commercial and free Lisps, including Allegro, Liquid (formerly Lucid), Lisp works and GNU. Binaries (executable code) are now available (in addition to the source code) for Linux and Windows systems, allowing FaCT to be used without a locally available Lisp. In order to make the FaCT system usable in realistic applications, a wide range of optimization techniques are used to implement the satisfiability testing algorithms. These include axiom absorption, lexical normalization, semantic branching search, simplification, dependency directed backtracking, heuristic guided search and caching [Horrocks & Patel-Schneider, 1999] . The use of these (and other) optimization techniques has now become standard in tableaux-based DL implementations [Patel-Schneider, 1998] , [Haarslev et al., 1998] . Work is underway on the development of Abox reasoning for the FaCT system (reasoning with individuals): a SHF Abox has recently been released [Tessaris & Gough, 1999] and a full SHIQ Abox is being developed [Horrocks et al., submitted].

3.3.3 CORBA Interface

In addition to the standard KRSS functional interface [Patel-Schneider & Swartout, 1993] , FaCT can also be configured as a classification and reasoning server using the Object Management Group's Common Object Request Broker Architecture (CORBA) [Bechhofer et al., 1999] . This approach has several advantages: it facilitates the use of FaCT by non-Lisp client applications; the API is defined using CORBA's Interface Definition Language (IDL), which can be mapped to various target languages; a mechanism is provided for applications to communicate with the DL system, either locally or remotely; and server components can be added/substituted without client applications even being aware of the change. This has allowed, for example, the successful use of FaCT's reasoning services in a (Java based) prototype EER schema integration tool developed as part of the DWQ project [Calvanese et al., 1999].

3.3.4 Performance

FaCT's optimizations are aimed specifically at improving the system's performance when classifying realistic ontologies and this results in a performance improvement of several orders of magnitude when compared with older DL systems. This performance improvement is often so great that it is impossible to measure precisely, as unoptimised systems are effectively non-terminating with ontologies that FaCT is easily able to deal with [Horrocks & Patel-Schneider, 1999] . Taking a large medical terminology ontology developed in the GALEN project [Rector et al., 1993] as an example, FaCT is able to check the consistency of all 2,740 classes and determine the complete class hierarchy in about 60 seconds of (450MHz Pentium III) CPU time. 3 In contrast, the KRIS system [Baader & Hollunder, 1991] was unable to complete the same task after several weeks of CPU time.

3.4 Current Limitations of OIL

Our starting point was to define a core language with the intention that additional (and possibly important) features be defined as a set of extensions (still with clearly defined semantics). Modelers will be free to use these language extensions, but it must be clear that this may compromise reasoning support. This seems to us a better solution than trying to define a single "all things to all men" language like Ontolingua. In this section we briefly discuss a number of features which are available in other ontology modeling languages and which are not, or not yet, included in OIL. For each of these features we briefly explain why we chose them, and mention future prospects where relevant.

Default reasoning: Although OIL does provide a mechanism for inheriting values from super-classes, such values cannot be overwritten. As a result, such values cannot be used for the purpose of modeling default values. If an attempt is made at "overwriting" an inherited attribute value, this will simply result in inconsistent class definitions which have an empty extension. For example, if we define the class "CS professor" with attribute "gender" and value "male", and we subsequently define a subclass for which we define the gender attribute as "female", this subclass will be inconsistent and have an empty extension (assuming that "male" and "female" are disjoint).

Rules/Axioms: As discussed above, only a fixed number of algebraic properties of slots can be expressed in OIL. There is no facility for describing arbitrary axioms that must hold for the items in the ontology. Such a powerful feature is undoubtedly useful. The use of OIL as an exchange language further justifies a more powerful axiom-language as you might need such axioms to enforce the correct interpretation of the source ontology when mapping into OIL. The lack of such an axiom-language is somewhat mitigated in OIL by the fact that we have a powerful concept and slot definition language. The main limitation is that we do not have composite definitions of relations. However, there is currently no broad support for any particular choice in this matter. The main problems in this area are first, that it is difficult to identify a common set of rule/axiom expressions that can be standardized, and second, that you have to define properly how these axioms can be integrated with the other modeling primitives of OIL.

Further algebraic properties: The lack of an axiom language can also be compensated for some-what by extending the set of properties that can be specified for relations in OIL. Currently this set contains inverse, transitivity and symmetry. Other reasonable candidates are reflexivity, irreflexivity, antisymmetry, asymmetry, linearity (aRb bRa for any pair a,b), connectivity (aRb or a=b or bRa for any pair a,b), partial order and total order. (Notice that some of these can be defined in terms of each other), cf. [Staab & Mädche, 2000].

Modules: Section 3.1 presented a very simple construction to modularize ontologies in OIL. In fact, this mechanism is identical to the namespace mechanism in XML and XML schema. It amounts to a textual inclusion of the imported module, where name-clashes are avoided by prefixing every imported symbol with a unique prefix indicating its original location. However, much more elaborate mechanisms would be required for the structured representation of large ontologies. Means of renaming, restructuring, and redefining imported ontologies must be available. Future extensions will cover parameterized modules, signature mappings between modules, and restricted export interfaces for modules. We will use the generic adapter concept of UPML (cf. [Fensel et al., 1999a] ) specialized to the fixed set of language primitives of OIL as [Gennari et al., 1994] , [Park et al., 1997] have developed for the fixed set of language primitives of Protégé.

Using instances in class definitions: Results from research in description and modal logics show that the computational complexity of such logics changes dramatically for the worse when domain-instances are allowed in class definitions [Schaerf, 1994] , [Blackburn & Seligman, 1998] , [Areces et al., 1999] . For this reason, OIL currently does not allow the use of instances in slot-values, or extensional definitions of classes (i.e., class definitions by enumerating the class instances). It is not clear how serious a restriction this is for an ontology language, as ontologies should, in general, be independent of specific instantiations--it may be that in many cases, "individuals" can more correctly be replaced with a primitive class or classes.

Concrete domains: OIL currently does not support concrete domains (e.g., integers, strings, etc.). This would seem to be a serious limitation for a realistic ontology language, and extensions of OIL in this direction are probably required. The theory of concrete domains is well understood [Baader & Hanschke, 1991] , and it should be possible to add some restricted form of concrete domains (but still with greater expressive power than XOL's numeric-minimum and numeric-maximum slot constraints) to OIL's core language without compromising its decidability (but a corresponding extension to the FaCT system would also be required if reasoning support is to be provided).

Limited Second-order expressivity: Many existing languages for ontologies (KIF, CycL [Lenat & Guha, 1990] , Ontolingua) include some form of reification mechanism, which allows the treatment of statements of the language as objects in their own right, thereby making it possible to express statements over these statements. A full second order extension would be clearly undesirable (even unification is un-decidable in full 2nd order logic). However, much weaker second order constructions already provide much if not all of the required expressivity without causing any computational problems (in effect, they are simply 2nd order syntactic sugar for what are essentially first order constructions). A precise characterization of such expressivity is required in a future extension of OIL. OIL is currently very restricted. Only classes are provided, not meta-classes or individuals.

 


1. This definition is embryonal. See Section 3.4 for more details.

2. This definition is embryonal. See Section 3.4 for more details.

3. Adding single classes and checking both their consistency and their position in the class hierarchy is virtually instantaneous.