Collaborative Correction Platform


Compare with similar tools:

Scenario


Use this tool when you want to purify your transcription, either for e-book creation or for better search results. The tool allows dozens of concurrent users working on the same book / newspaper. Once the adaptive OCR is in use, you can also enjoy from monitoring capabilities and therefore use external operators (either volunteers or low cost service provider). The tool incorporate several productivity tools to allow a fast and accurate verification process.

Abstract


The COoperative eNgine for Correction of ExtRacted Text (CONCERT) is a web-based platform, suitable for massive volunteer participation, which validates and corrects OCR results. In this way, it enables the general public to help with large scale digitisation efforts.

The technology streamlines, simplifies and accelerates the process of winnowing out questionable text scans, enabling reviewers to key in corrections to the text. Instead of displaying an entire scanned page, reviewers only see the actual letters or words in question. For example, the letter combination “r” and “n” (“rn”) may appear indistinguishable from the letter “m.” In those instances, the system collects many instances of the letter “m,” and places these samples next to the letters in question, making it much easier to determine the letter’s real identity.

In cases where an entire word is suspect, it is added to a collection of other questionable terms, which are then arranged in alphabetical order. Volunteer reviewers need only accept or reject suggested substitutes with one keystroke. In addition, the system uses adaptive dictionary enrichment, a method in which new words are added to a central dictionary based on cross-identification and correction by other users.

In the final session of the tool, operators are shown the full page in context. This will allow them to correct any outstanding letters, to identify false positives, and to correct the segmentation of the page CONCERT (where words and letters have been incorrectly combined or dispersed).

CONCERT - Character session tab

CONCERT – Character session tab

Publications

  • Fischer, M.

    An Introduction to the CONCERT tool developed for IMPACT by IBM by the IMPACT Centre of Competence.

Availability

The tool is available under commercial licence. For further information on licencing, please contact IBM-Israel IMPACT Group.

OCR Post-correction and Enrichment