Use this tool when you want to purify your transcription, either for e-book creation or for better search results. The tool allows dozens of concurrent users working on the same book / newspaper. Once the adaptive OCR is in use, you can also enjoy from monitoring capabilities and therefore use external operators (either volunteers or low cost service provider). The tool incorporate several productivity tools to allow a fast and accurate verification process.
The COoperative eNgine for Correction of ExtRacted Text (CONCERT) is a web-based platform, suitable for massive volunteer participation, which validates and corrects OCR results. In this way, it enables the general public to help with large scale digitisation efforts.
The technology streamlines, simplifies and accelerates the process of winnowing out questionable text scans, enabling reviewers to key in corrections to the text. Instead of displaying an entire scanned page, reviewers only see the actual letters or words in question. For example, the letter combination “r” and “n” (“rn”) may appear indistinguishable from the letter “m.” In those instances, the system collects many instances of the letter “m,” and places these samples next to the letters in question, making it much easier to determine the letter’s real identity.
In cases where an entire word is suspect, it is added to a collection of other questionable terms, which are then arranged in alphabetical order. Volunteer reviewers need only accept or reject suggested substitutes with one keystroke. In addition, the system uses adaptive dictionary enrichment, a method in which new words are added to a central dictionary based on cross-identification and correction by other users.
In the final session of the tool, operators are shown the full page in context. This will allow them to correct any outstanding letters, to identify false positives, and to correct the segmentation of the page CONCERT (where words and letters have been incorrectly combined or dispersed).
- Neudecker, C. and A. Tzadok, User Collaboration for Improving Access to Historical Text, LIBER2010 Annual Conference, 29 June – 1 July 2010, Arhus, Denmark. Also published as a paper in LIBER Quarterly, vol. 20 (2010) no.1.
- Tzadok, A.
CONCERT – COoperative eNgine for Correction of ExtRacted Text. IMPACT Final Conference 2011, 24-25 October, London, UK
- Fischer, M.
An Introduction to the CONCERT tool developed for IMPACT by IBM by the IMPACT Centre of Competence.