Author: Mathieu Fannee, Tom Kenter (INL)
Short functional description
CoBaLT (Corpus Based Lexicon Tool) is an application in which a corpus of texts can be loaded so as to be able to annotate its tokens (lemmatize and more). The annotation work in CoBaLT gives two products: an annotated corpus, and a lexicon consisting of word forms and their corresponding lemmata and such assigned to them.
In terms of workflow, working with CoBaLT will mean:
- Loading a corpus into the tool,
- Using the tool to annotate the corpus,
- Finally exporting the result of this work in the form of an enriched version of the corpus files originally loaded into the tool. The resulting lexicon can be exported to XML, or one can just re-use the lexicon database built during the CoBaLT work.
During the installation, some PHP, MySql and Apache variables need to be set to ensure proper functioning of the tool. Check the ‘Apache settings’, ‘PHP settings’ and ‘MySql settings’ sections for that.
Some settings discussed below aim to facilitate working with very large data sets: give those settings special attention if you need to process lots of data.