Tools for text digitisation
An overview with more than 250 state-of-the-art tools for text digitisation. These tools can be filtered according to their purpose. In addition, anyone registered at our website is able to add new tools by completing a simple form. The main groups in which these tools are clasified are:
- Image enhancement
- Text recognition (OCR)
- OCR Post-correction and enrichment
- Text processing
The Impact Centre of Competence Demonstrator platform allows users to test a number of tools online without installing any software on their computers. While registered users can test the tools with default parameters, Impact Centrre of Competence members are able to choose their own configuration and user their input files.
The various language institutes in IMPACT project built lexica for historical languages. The aim is to improve OCR results for historical text, and also to ensure that the user finds historic variants of word when searching for the modern-day form.
IMPACT project built lexica for nine historical languages. It also built special lexica for named entities (specific names of for example places and people) in three languages. Most of these resources are available for the public under no commercial license. To access these lexica, it is only needed to register at the Impact Centre.
Image and Ground Truth resources
The Impact Centre of Competence dataset contains more than half a million representative text-based images compiled by a number of major European libraries. Covering texts from as early as 1500, and containing material from newspapers, books, pamphlets and typewritten notes, the dataset is an invaluable resource for future research into imaging technology, OCR and language enrichment.