Author: Mathieu Fannee, Tom Kenter (INL)

Short functional description

CoBaLT (Corpus Based Lexicon Tool) is an application in which a corpus of texts can be loaded so as to be able to annotate its tokens (lemmatize and more). The annotation work in CoBaLT gives two products: an annotated corpus, and a lexicon consisting of word forms and their corresponding lemmata and such assigned to them.

In terms of workflow, working with CoBaLT will mean:

  1. Loading a corpus into the tool, 
  2. Using the tool to annotate the corpus,
  3. Finally exporting the result of this work in the form of an enriched version of the corpus files originally loaded into the tool. The resulting lexicon can be exported to XML, or one can just re-use the lexicon database built during the CoBaLT work.

Technical overview

CoBaLT is an AJAX application designed for Mozilla Firefox (other browsers may work as well, but are discouraged). It uses a MySQL database, and is written in PHP, Perl and Javascript.

During the installation, some PHP, MySql and Apache variables need to be set to ensure proper functioning of the tool. Check the ‘Apache settings’, ‘PHP settings’ and ‘MySql settings’ sections for that.

Some settings discussed below aim to facilitate working with very large data sets: give those settings special attention if you need to process lots of data.