Develop the ontology

Submitted by rnsrk on Wed, 08/25/2021 - 15:04

The ontology is fundamental part of WissKI. It determines which statements can be made about the data and on which "ways" you can traverse through the semantics. Giving a detailed introduction is out of scope of this guide, but you may have a starting point with the paper "Knowledge Organization and Data Modeling in the Humanities" by Julia Flanders and Fotis Jannidis.

In this guide we will use an fictional museum with two collections as an example. It will have only a few individuals and a very limited scope, but should demonstrate the basic requirements of a research data lifecycle.

Ontology engineering like every form of data modeling is an agile process, your ontology must adapt with the knowledge about your material and there is not one right way to represent reality digitally - but there are a few guidelines to increase the expressiveness and usability of your model.

Modeling fundamentals

Every ontology needs a theoretical foundation of perspectives, questions and functionalities. Have user stories ready for their development and evaluation.
Reality is complex, try to build a simplified representation that answer your question and enables the features you want. No more, no less.
Decide between
- entities (the person, who has written the book Systema Naturae) and their appellations (Carl von Linné, Carolus Linnaeus, or Karl Linné)
- objects you can touch (the nearly complete fossil of Plesiosaurus dolichodeirus at the Natural History Museum in London), concepts of these objects (the species Plesiosaurus dolichodeirus as a scientific aggregation of all characteristics of the biological entity), and its representations (the image of the Berlin specimen Archaeopteryx lithographica at wikipedia).
- Source data of observations and transkriptions (born on the first day of march in the year of the enthronement of Louis the 14th.) and normalisations (01.03.1643)
Visualize your ontology with entity-relation schemas and document your entities and their attributes in a meaningful manner.
Work in iterations from the most fundamental needs (what? who? where? when? - objects, persons, places, dates) to additional, often time consuming questions (how? why? whereby? - processes, causalities, dependencies) if possible.
Better extend the existing than invent something new.
Discuss your ontology in internal and external contexts.

Identify entities and their attributes

First step is to identify the things we have to deal with and what characteristics they have. Our example museum has two collections: a biological collection containing a fossil of the Species Plesiosaurus dolichodeirus, a preparation of a Galápagos tortoise (Chelonoidis porteri), and a preparation of "Bobby", the first Gorilla of the Berlin Zoo; and an art collection containing artworks of Hilda Belcher and Meret Oppenheim. The first step is to identify the entities based on the existing material and the issues or requirements for that material. Obviously we have specimen, artworks and persons. What do we want to know about?

Let's say, we want the scientific name and vernaculars of the specimen and the place where they have been found. The scientific name consists of two parts: The name itself and the bibliographic reference of the first description of that taxon. The bibliographical reference in turn has an author and bibliographical information. You see how fast data expand, as you try to structure it. But this just the first collection.

The art collection contains the painting The Checkered Dress by Hilda Belcher and the Der grüne Zuschauer (the Green Observer), and Tisch mit Vogelfüßen (Table with Bird Feet) by Meret Oppenheimer. We want the titles and images of the artworks and when they were created and we like to know something about the artist, when and where they were born and died. Table 1 shows the information about the objects.

Table 1: Information about the objects of the museum

Therefor, the complete ontology so far must contain:

Collections with their titles
Persons with name, birthplace, birthdate, deathplace, and deathdate.
Artworks with artist, title, image, and timespan of creation.
Specimen with taxonomical declaration, and encounter location.
Bibliographical reference with author, title, edition, publication place, publication year etc. Since bibliographical information is quite complex and we just want to use the authority to identify the taxon, we simplify it to author, sigle (Conybeare, 1824) and sum up the remaining data under bibliographical information.
Places with place name and coordinates.

Map with CIDOC CRM classes and extend CIDOC CRM

Second step is to search and map the corresponding classes in the CIDOC CRM manual to our concepts. Mapping CIDOC CRM classes is a quite philosophical task, where you have to think about what an entity is, what features it bears and if this features are entities with own characteristics itself. You will encounter that CIDOC CRM is not that specific and talks about Human Made Thing and not about artworks. In many cases the abstract class is sufficient, but some concepts need more specialisation. Figure 1 illustrates the Entities and their attributes of our example collection. As you see there are CIDOC CRM native classes (the one beginning with an "E") and additional classes (like Collection Title, Taxon or Person Name).

Additonal classes are always subclasses of existing CIDOC CRM Entities. For example, as we see in figure 1, we want a field with a title for our collection. CIDOC CRM provides a class called "E35 Title", which fits our semantics, but since we have artwork titles as well, we may want to distinguish between those semantics and add the two sublasses "Collection Title" and "Artwork Title". Another class that is commonly extended is "E41 Appellation". It should be extended as not to lump all the denominations together in one container. As we want to distinguish between person names and place names, we add these subclasses to our ontology. Adding subclasses is one way to increase the level of detail in the ontology.

To do so, open your local version of the Erlangen CRM / OWL with the Protégé browser and navigate and focus the class E41 Appellation (you can open the search with crtl + f). Click Add subclass and choose a name and an IRI that fits your expectations. Click OK and add an annotation by clicking on the plus sign in the annotations window. You can choose a label and add a description of your class via the comment. Good practice is to have a meaningful scope note and some examples to illustrate, which values are expected. Please save you ontology in OWL DL RDF/XML notation.

Example screencast

If you implemented all missing classes from figure 1, you should added eleven new entities to the Erlangen CRM Ontology (see figure 2). You can download the ontology here if you do not like the typing. Now you are ready to import the ontology to your triple store.

Develop the ontology

Modeling fundamentals

Identify entities and their attributes

Map with CIDOC CRM classes and extend CIDOC CRM

Example screencast

Need help?