CCP logo

Knowtator

 

Managing annotation data

Introduction

Setting up an annotation project is a very labor intensive task that takes a lot of planning and hands on management. This page is by no means trying to provide a handbook for running your own annotation project. If you are thinking about running your own annotation project I suggest you read as much as you can about other successful annotation projects, contact the people who ran them, and pay them large amounts of money to give you lots of advice. This page is intended to help from a mechanical/technical perspective by describing some of the features of Knowtator that make managing the annotation data easier.

Overview

All of the human annotators in our lab work on a part-time basis and work a limited number of hours on weekends and evenings. A key perk of this job description is that they can work at home without being connected to the Internet. To realize this possibility, we elected to use the Protégé file-based (CLIPS) backend for saving annotations in Protégé created by annotators. This persistence mechanism saves the contents of a Protégé project into three files: my-annotation-project.pprj (the project file), my-annotation-project.pont (the ontology file), my-annotation-project.pins (the instances file). There is nothing about the Knowtator code that requires using the CLIPS backend vs. the Database backend (in fact, we have used the DB backend). However, to make deployment as simple as possible on annotator's home machines we use the file-based backend. The following documentation assumes this configuration.

This page steps through an annotation workflow that we have put in place in our lab. The basic steps are:

  1. create an annotation schema
  2. distribute the annotation schema to the annotators
  3. annotators annotate
  4. merge annotations from multiple annotators
  5. (optional and discouraged) update/edit annotation schema on merged annotation project. Go back to step 2.
  6. IAA metrics are run (as appropriate)
  7. export annotations to XML
  8. Tips & Tricks

Create an annotation schema

I will not attempt to provide advice on how to create an annotation schema that is appropriate for your task. I encourage you to look at annotation tasks that are provided in the examples directory and look at annotation tasks that are provided with Callisto, WordFreak, and GATE. Like any meta-data it is always helpful to get the annotation schema for your annotation task right first - before "real" annotation begins. The more upfront effort you put into designing the annotation schema, the easier your life will be.

The Protégé project that you create will need:

Distribute Knowtator with annotation task

To distribute Knowtator to the annotators I have an ant script that creates a zip file which can be extracted in the Protege home directory. The zip file includes Knowtator which is unzipped into the plugins directory. The zip file also includes a directory labelled 'projects' that contains a directory for the name of the annotation project. This nested directory contains the Protégé .pprj, .pont, .pins files and a directory where the text sources. The file structure looks something like this: Troubleshooting! my-annnotation-project.pprj if set correctly will try to import knowtator.pprj when loaded. It is possible/probable that the pointer to knowtator.pprj in the my-annotation-project.pprj file is an absolute file path. My ant script automatically changes this path reference to a relative one such as ../../plugins/edu.uchsc.ccp.knowtator/knowtator.pprj.

The prepare-annotation-project-zip-file target can create the zip file for you. I strongly recommend using ant to automate this kind of tedious and error prone activity.

With a zip file that includes Knowtator and the annotation task, the annotator need only have Protégé installed and be able to unzip the zip file into the Protégé installation directory.

Annotators annotate

Merge annotations from multiple annotators

Update/edit annotation schema on merged annotation project

If you need to update the annotation schema in "midstream", then you should update it with all annotations from all annotators loaded (as is done in the previous step.) Simply giving the annotators an updated my-annotation-project.pont file is frought with peril - so don't even try it. After making changes, redistribute the my-annotation-project's .pprj, .pont, and .pins file. If you don't want the other annotators to be burdened with all annotations from all annotators, you might need to do some extra work to remove annotations - or you might consider using the annotation filtering mechanism. Go back to step 2.

Run IAA metrics

Once all annotations are in a single Protégé project, it is possible to run inter-annotator agreement (IAA) metrics for those text sources that were annotated by multiple annotators. This step will be documented in a separate page as the code and documentation become available (in progress.)

Export annotations to XML

We have created a simple export utility that writes a set of annotations to XML documents. There is a menu item 'Knowtator -> Export annotations to XML' that will guide you through the process of exporting to XML.

Tips & Tricks

The following are some pieces of advice for making your experience with Knowtator optimal: