Tools & Resources
The IMPACT Centre of Competence provides access to a remarkable collection of tools and resources for the digitisation of historical texts. Some of the tools can be tested online in our Demonstrator Platform and the resources are composed by historical lexica and an image and grund-truth dataset for 10 different languages.
Tools for text digisation
An overview with more than 250 state-of-the-art tools for text digitisation. These tools can be filtered according to their purpose. The main groups in which these tools are classified are:
Language resources
The various language institutes in IMPACT project built lexica for historical languages. The aim is to improve OCR results for historical text, and also to ensure that the user finds historic variants of word when searching for the modern-day form.
IMPACT project built lexica for ten historical languages. It also built special lexica for named entities (specific names of for example places and people) in three languages.
Image and Ground Truth resources
The Impact Centre of Competence dataset contains more than half a million representative text-based images compiled by a number of major European libraries. Covering texts from as early as 1500, and containing material from newspapers, books, pamphlets and typewritten notes, the dataset is an invaluable resource for future research into imaging technology, OCR and language enrichment.