Resources
IMPACT provides access to a remarkable collection of resources for the digitisation of historical text.
The various language institutes in IMPACT project built LEXICA for historical languages. The aim is to improve OCR results for historical text, and also to ensure that the user finds historic variants of word when searching for the modern-day form. IMPACT project built lexica for ten historical languages. It also built special lexica for named entities (specific names for example of places and people) in three languages.
The IMPACT Centre of Competence DATASET contains more than half a million representative text-based images compiled by a number of major European libraries. Covering texts from as early as 1500, and containing material from newspapers, books, pamphlets and typewritten notes, the dataset is an invaluable resource for future research into imaging technology, OCR and language enrichment.
Language resources
- Historical Lexica
- Named Entities Lexica
- Historical Corpora