MSc Project: Notions of modularity for ontologies

Background

In computer science, ontologies are used to provide a common vocabulary for a domain of interest together with descriptions of the meaning of terms built from the vocabulary and relationships between them. Ontologies in this sense are increasingly used in knowledge management systems, medical and bio-informatics, and are set to play a key role in the semantic web and grid. In order to be computer-accessible, modern ontologies are formulated in an ontology language based on description logics} such as OWL.

Current applications are leading to the development of large and complex ontologies (sometimes more than 300,000 different terms). Engineering and maintaining such ontologies is a complex task, and it has to be carried out with care for the ontology to be of use. It may involve a group of ontology engineers and domain experts co-operating in order to design the ontology, update it to reflect changes/developments in the domain, and integrate it with other ontologies so as to cover larger domains. For example, the National Cancer Institute (NCI) has approximately twelve people working on its oncology ontology at any given time. These people are geographically distributed and range from dedicated ontologists to managers who contribute an occasional change. In the last eighteen months, the number of classes in the ontology has grown from approximate 40,000 to over 57,000 (including many changes to existing terms).

The two key advantages of using a description logic based ontology language such as OWL over alternative representation mechanism (such as semantic nets or frames) is that they have an unambiguous semantics, and that we can make use of reasoning services of description logic (DL) reasoners for ontology engineering. These services typically include computing the subsumption hierarchy between classes (classification), answering queries, testing the consistency of class description, and finding explanations for inconsistencies.

The availability of these services, especially in editors such as SWOOP and Protege OWL, change how ontology engineers work. Classification, for example, facilitates bottom up construction of taxonomies. So, the ontology engineer can focus on the definition of terms, rather than the relations between them.

However, these services are not sufficient for engineering and maintaining large ontologies, especially in the collaborative case. Local changes to an ontology, and interactions between such changes, can have highly non-local effects that are currently unpredictable. The only time to examine these effects is after the changes have been made, in the light of all the proposed changes. And even then there are changes whose impact is not detectable using the current suite of reasoning services. For example, they might only affect subsumptions between certain complex terms, or interact unexpectedly with future additions---which make them dangerous ``time bombs''. This has lead the NCI modeling team to put stringent, though ad-hoc, laborious and still error-prone, restrictions on their work process: there is a procedure for ``checking out'' a class; overnight, the ontology is classified; the effects of changes are studied the next day; finally the changes must be approved by an editor before they are incorporated into the ontology.

A further problem is that ontologies are published and used as they are developed: as monolithic entities. For example, the NCI ontology is focused on cancers and genes, yet it also contains an impoverished ontology of cooking. This area is obviously of only tangential interest to the NCI: they just need to talk about the risks for certain cancers associated with char-grilled fish. However, this fragment could be used and developed by other groups, but only if the fragments can be correctly separated from the rest and extensions safely merged back in.

To sum up, the state of ontology engineering is very similar to the state of software engineering before the advent of structured programming techniques: ontologies cannot be decomposed into semantically distinct components, we cannot predict the scope of a (local) change, and how to re-use parts of ontologies or safely compose them are open problems. In software engineering, human documentation and rigorous process restrictions were put into place, as well as preliminary mechanisms for structuring programs (e.g., type and module systems). As these mechanisms have grown more sophisticated, they have led to new automated techniques for transforming programs for performance (e.g., separate compilation), understanding (e.g., refactoring), and re-use (e.g., modules).

It has been convincingly argued that methodologies and algorithmic support for composing and decomposing ontologies in a controlled way will be the key to supporting collaborative ontology engineering and re-use. More precisely, it will be crucial to develop methodologies and algorithmic support for

developing ontologies with interfaces (and acceptable restrictions on their usage) which guarantee that, if such an ontology is composed with other ontologies, it neither corrupts nor is corrupted by the ontologies they are composed with;
evaluating the consequences of the composition of a set of given ontologies which may have been built in a completely unrestricted way;
decomposing a large ontology into modules that can be edited in a controlled way.

The first item is prescriptive in the sense that the developer will be ``forced'' to follow a certain design method which automatically leads to well-behaved modular ontologies. The second item is analytical in that it supports the evaluation of the result of composing arbitrary ontologies. The last item comprises both approaches: after automatically segmenting a given ontology analytically, the designer might want to follow a certain methodology to ensure that editing the segment does not have damaging effects or, alternatively, might prefer to edit the segment ``arbitrarily'' but be supported in evaluating the effects of merging it back.

Satisfactory solutions to those are indispensable to support ontology engineers in collaborative editing and reuse of ontologies. For example, they would allow for simultaneous editing of an ontology by segmenting the ontology into modules in such a way that reasoning can be carried out locally and independently on each module, and that the effects of local changes would not corrupt the whole ontology nor other modules. No understanding of the whole ontology would be required while editing a module. As a valuable side-effect, segmenting large ontologies is seen as a key new optimization technique for reasoning, allowing both for larger ontologies to be processed, but also allowing for nearer to real time processing.

Project

The objective of this project is to analyse the applicability of conservative extensions to develop notions of modularity in weak ontology languages corresponding, in expressive power, to Entity-Relashionship and UML diagrams.