Managing annotation data

Introduction

Setting up an annotation project is a very labor intensive task that takes a lot of planning and hands on management. This page is by no means trying to provide a handbook for running your own annotation project. If you are thinking about running your own annotation project I suggest you read as much as you can about other successful annotation projects, contact the people who ran them, and pay them large amounts of money to give you lots of advice. This page is intended to help from a mechanical/technical perspective by describing some of the features of Knowtator that make managing the annotation data easier.

Overview

All of the human annotators in our lab work on a part-time basis and work a limited number of hours on weekends and evenings. A key perk of this job description is that they can work at home without being connected to the Internet. To realize this possibility, we elected to use the Protégé file-based (CLIPS) backend for saving annotations in Protégé created by annotators. This persistence mechanism saves the contents of a Protégé project into three files: my-annotation-project.pprj (the project file), my-annotation-project.pont (the ontology file), my-annotation-project.pins (the instances file). There is nothing about the Knowtator code that requires using the CLIPS backend vs. the Database backend (in fact, we have used the DB backend). However, to make deployment as simple as possible on annotator's home machines we use the file-based backend. The following documentation assumes this configuration.

This page steps through an annotation workflow that we have put in place in our lab. The basic steps are:

create an annotation schema
distribute the annotation schema to the annotators
annotators annotate
merge annotations from multiple annotators
(optional and discouraged) update/edit annotation schema on merged annotation project. Go back to step 2.
IAA metrics are run (as appropriate)
export annotations to XML
Tips & Tricks

Here is an ant script that can be modified to perform some of the tasks described below.

Create an annotation schema

I will not attempt to provide advice on how to create an annotation schema that is appropriate for your task. I encourage you to look at annotation tasks that are provided in the examples directory and look at annotation tasks that are provided with Callisto, WordFreak, and GATE. Like any meta-data it is always helpful to get the annotation schema for your annotation task right first - before "real" annotation begins. The more upfront effort you put into designing the annotation schema, the easier your life will be.

The Protégé project that you create will need:

Knowtator included. (see the installation page and the configuration page)
Classes/Instances defined that represent the kinds of annotations you will be creating
Instances corresponding to each of the annotators who will be annotating (instances of knowtator human annotator)
a directory with the text sources you plan to annotate (or an implementation of TextSourceCollection)

Distribute Knowtator with annotation task

To distribute Knowtator to the annotators I have an ant script that creates a zip file which can be extracted in the Protege home directory. The zip file includes Knowtator which is unzipped into the plugins directory. The zip file also includes a directory labelled 'projects' that contains a directory for the name of the annotation project. This nested directory contains the Protégé .pprj, .pont, .pins files and a directory where the text sources. The file structure looks something like this:

<protege-home>/plugins/edu.uchsc.ccp.knowtator/<knowtator-files>
<protege-home>/projects/my-annotation-project/my-annotation-project.pprj
<protege-home>/projects/my-annotation-project/my-annotation-project.pont
<protege-home>/projects/my-annotation-project/my-annotation-project.pins
<protege-home>/projects/my-annotation-project/textsources/<text-source-files>

Troubleshooting! my-annnotation-project.pprj if set correctly will try to import knowtator.pprj when loaded. It is possible/probable that the pointer to knowtator.pprj in the my-annotation-project.pprj file is an absolute file path. My ant script automatically changes this path reference to a relative one such as ../../plugins/edu.uchsc.ccp.knowtator/knowtator.pprj.

The prepare-annotation-project-zip-file target can create the zip file for you. I strongly recommend using ant to automate this kind of tedious and error prone activity.

With a zip file that includes Knowtator and the annotation task, the annotator need only have Protégé installed and be able to unzip the zip file into the Protégé installation directory.

Annotators annotate

The first thing an annotators should do when they first bring up Knowtator with a new annotation project is to change the default annotator. This can be done with the menu item knowtator/configure. This brings up a (very minimalistic) configuration dialog. The first property labelled Annotator sets the default annotator for new annotations.
All of the annotation data is saved to the my-annotation-project.pins file. Annotators should backup this file to a medium such as a CD-R or flash drive. Our annotators email or upload their my-annotation-projects.pins once a week.

Merge annotations from multiple annotators

Each annotator should send you a .pins file (my-annotation-project.pins). Save this file as annotator-last-name.pins in a directory named by the date the data was collected such as:
oct-06-2005/doe.pins
For each annotator, create a Protégé project that opens by itself with the annotations from the annotator. This can be done with the original annotation schema developed in step one above and the .pins file created by the annotator. Use the target called make-project-for-annotator in the ant script. Open each Protégé project for each annotator to make sure everything looks good.
Open one of the annotator's projects and use the 'knowtator/merge' menu item to merge the other annotator's projects into the open project (you may want to 'save as' the working project as something else like 'all.pprj'). The GUI for doing this is less than elegant. Please be patient while the project is loading and being copied.
- Select the project that you want to add to the current project. The project will now load. A dialog will notify you when the project has been loaded.
- Click ok to proceed with merging the annotations. The gui will go blank or inactive until this process is complete - monitor the system output for reassurance that something is going on. Hopefully, I will get around to making this look better - but it is a low priority.
Observe that all annotations are now in a single Protégé project.

Update/edit annotation schema on merged annotation project

If you need to update the annotation schema in "midstream", then you should update it with all annotations from all annotators loaded (as is done in the previous step.) Simply giving the annotators an updated my-annotation-project.pont file is frought with peril - so don't even try it. After making changes, redistribute the my-annotation-project's .pprj, .pont, and .pins file. If you don't want the other annotators to be burdened with all annotations from all annotators, you might need to do some extra work to remove annotations - or you might consider using the annotation filtering mechanism. Go back to step 2.

Run IAA metrics

Once all annotations are in a single Protégé project, it is possible to run inter-annotator agreement (IAA) metrics for those text sources that were annotated by multiple annotators. This step will be documented in a separate page as the code and documentation become available (in progress.)

Export annotations to XML

We have created a simple export utility that writes a set of annotations to XML documents. There is a menu item 'Knowtator -> Export annotations to XML' that will guide you through the process of exporting to XML.

Tips & Tricks

The following are some pieces of advice for making your experience with Knowtator optimal:

For a given annotation project, choose and setup your text source collection once and use it throughout the duration of the project. There are several features in Knowtator that assume that if an instance of text source is selected that it is available in the currently open text source collection. This assumption may have disastrous results if existing text source instances correspond to different text source collections.