Managing annotation data
Introduction
Setting up an annotation project is a very labor intensive task that takes a lot of planning
and hands on management. This page is by no means trying to provide a handbook for running
your own annotation project. If you are thinking about running your own annotation project
I suggest you read as much as you can about other successful annotation projects, contact the
people who ran them, and pay them large amounts of money to give you lots of advice.
This page is intended to help from a mechanical/technical
perspective by describing some of the features of Knowtator that make managing the annotation
data easier.
Overview
All of the human annotators in our lab work on a part-time basis and work a limited number of
hours on weekends and evenings. A key perk of this job description is that they can work at home
without being connected to the Internet. To realize this possibility, we elected to use the
Protégé file-based (CLIPS) backend for saving annotations in Protégé
created by annotators. This persistence mechanism saves the contents of a Protégé project into
three files: my-annotation-project.pprj (the project file), my-annotation-project.pont (the ontology file), my-annotation-project.pins (the instances file).
There is nothing about the Knowtator code that requires using the CLIPS backend
vs. the Database backend (in fact, we have used the DB backend). However, to make deployment as simple
as possible on annotator's home machines we use the file-based backend. The following documentation
assumes this configuration.
This page steps through an annotation workflow that we have put in place in our lab. The basic steps are:
- create an annotation schema
- distribute the annotation schema to the annotators
- annotators annotate
- merge annotations from multiple annotators
- (optional and discouraged) update/edit annotation schema on merged annotation project. Go back to step 2.
- IAA metrics are run (as appropriate)
- export annotations to XML
- Tips & Tricks
- Here is an ant script that can be modified to perform some of the tasks
described below.
Create an annotation schema
I will not attempt to provide advice on how to create an annotation schema that is appropriate for your task.
I encourage you to look at annotation tasks that are provided in the
examples
directory and
look at annotation tasks that are provided with Callisto, WordFreak, and GATE.
Like any meta-data it is always helpful to get the annotation schema for your annotation task
right first - before "real" annotation begins. The more upfront effort you put into designing the
annotation schema, the easier your life will be.
The Protégé project that you create will need:
- Knowtator included. (see the installation page and the configuration page)
- Classes/Instances defined that represent the kinds of annotations you will be creating
- Instances corresponding to each of the annotators who will be annotating (instances of knowtator human annotator)
- a directory with the text sources you plan to annotate (or an implementation of TextSourceCollection)
Distribute Knowtator with annotation task
To distribute Knowtator to the annotators I have an ant script that creates a zip file which can be
extracted in the Protege home directory. The zip file includes Knowtator which is unzipped into
the plugins directory. The zip file also includes a directory labelled 'projects'
that contains a directory for the name of the annotation project. This nested directory contains
the Protégé .pprj, .pont, .pins files and a directory where the text sources. The file
structure looks something like this:
- <protege-home>/plugins/edu.uchsc.ccp.knowtator/<knowtator-files>
- <protege-home>/projects/my-annotation-project/my-annotation-project.pprj
- <protege-home>/projects/my-annotation-project/my-annotation-project.pont
- <protege-home>/projects/my-annotation-project/my-annotation-project.pins
- <protege-home>/projects/my-annotation-project/textsources/<text-source-files>
Troubleshooting! my-annnotation-project.pprj if set correctly will try to import
knowtator.pprj when loaded. It is possible/probable that the pointer to knowtator.pprj in the
my-annotation-project.pprj file is an absolute file path. My ant script automatically changes this
path reference to a relative one such as
../../plugins/edu.uchsc.ccp.knowtator/knowtator.pprj
.
The prepare-annotation-project-zip-file target can create the zip file for you.
I strongly recommend using ant to automate this kind of tedious and error prone activity.
With a zip file that includes Knowtator and the annotation task, the annotator need only have Protégé
installed and be able to unzip the zip file into the Protégé installation directory.
Annotators annotate
- The first thing an annotators should do when they first bring up Knowtator with a new annotation project
is to change the default annotator. This can be done with the menu item knowtator/configure. This
brings up a (very minimalistic) configuration dialog. The first property labelled Annotator
sets the default annotator for new annotations.
- All of the annotation data is saved to the my-annotation-project.pins file. Annotators should backup
this file to a medium such as a CD-R or flash drive. Our annotators email or upload their
my-annotation-projects.pins once a week.
Merge annotations from multiple annotators
- Each annotator should send you a .pins file (my-annotation-project.pins). Save this file as
annotator-last-name.pins in a directory named by the date the data was collected such as:
oct-06-2005/doe.pins
- For each annotator, create a Protégé project that opens by itself with
the annotations from the annotator. This can be done with the original annotation schema
developed in step one above and the .pins file created by the annotator. Use the target called
make-project-for-annotator
in the ant script. Open each Protégé project for each
annotator to make sure everything looks good.
- Open one of the annotator's projects and use the 'knowtator/merge' menu item to merge the other annotator's projects into
the open project (you may want to 'save as' the working project as something else like 'all.pprj').
The GUI for doing this is less than elegant. Please be patient while the project is loading and being copied.
- Select the project that you want to add to the current project. The project will now load. A dialog will notify
you when the project has been loaded.
- Click ok to proceed with merging the annotations. The gui will go blank or inactive until this process is complete -
monitor the system output for reassurance that something is going on. Hopefully, I will get around to making this
look better - but it is a low priority.
- Observe that all annotations are now in a single Protégé project.
Update/edit annotation schema on merged annotation project
If you need to update the annotation schema in "midstream", then you should update it with all annotations from all
annotators loaded (as is done in the previous step.) Simply giving the annotators an updated my-annotation-project.pont
file is frought with peril - so don't even try it. After making changes, redistribute the my-annotation-project's .pprj, .pont, and
.pins file. If you don't want the other annotators to be burdened with all annotations from all annotators, you might need to
do some extra work to remove annotations - or you might consider using the annotation filtering mechanism. Go back to
step 2.
Run IAA metrics
Once all annotations are in a single Protégé project, it is possible to run inter-annotator agreement (IAA) metrics
for those text sources that were annotated by multiple annotators. This step will be documented in a separate page as the code
and documentation become available (in progress.)
Export annotations to XML
We have created a simple export utility that writes a set of annotations to XML documents. There is a menu item 'Knowtator -> Export
annotations to XML' that will guide you through the process of exporting to XML.
Tips & Tricks
The following are some pieces of advice for making your experience with Knowtator optimal:
- For a given annotation project, choose and setup your text source collection once and use it throughout the duration of the project.
There are several features in Knowtator that assume that if an instance of text source is selected that it is available in the currently
open text source collection. This assumption may have disastrous results if existing text source instances correspond to different text source collections.
Maintained by Philip V. Ogren.
This file last modified Monday, 08-Dec-2008 21:58:28 UTC