Post-correction

OCR produces its best results from well-printed, modern documents. But historical documents contain a range of effects that can reduce accuracy of recognition: from poor paper quality, poor typesetting, damage or degradation of the original paper source, and text skew or warping due to age or humidity. In addition to this, content holding institutions will tend to have legacy data: text-based digitised material that was not originally created with OCR in mind.

This sort of material will produce unsatisfactory OCR accuracy and render digital material only partially discoverable and useable at best. IMPACT has therefore created a number of tools and modules that will allow institutions and their users to correct and validate OCR text either prior to publication or after (by means of crowdsourcing).