PRImA Geometric Correction: Arbitrary Warping


Compare with similar tools:


Scenario


Correction of arbitrary warping effects in document images (e.g. due to humidity). There are two versions: a command line tool which can be used in automated workflows and a GUI tool allowing manual intervention, including showcase production as well as ground truth creation.

Abstract

Historical document images frequently show evidence of geometric distortions mostly due to storage conditions (arbitrary warping) but also due to the original printing process (non-straight text lines), the use of the document (folds) and scanning method (page curl). Correcting such distortions improves both recognition rate and visual appearance (e.g. for easier human reading or on-demand printing).

However, the nature of the documents with layout irregularities and roken/touching characters of archaic fonts poses significant challenges. In addition, for large-scale digitisation of books and newspapers, methods need to be robust, efficient, reversible and must be able to be applied unsupervised on (possibly multi-columned) documents that may or may not be warped (no distortion should be introduced on unwarped images). No such method exists in the literature.

Within IMPACT an effective grid-based method has been developed to geometrically model and correct arbitrarily warped historical documents with relatively complexlayout (multi column with graphics). A global grid with sub-grids for differing parts of a page is constructed by accurately determining text baselines. The warped image is corrected by transforming each quadrilateral sub-grid of the global grid into its intended rectangular form. Preliminary experimental results show that this method efficiently corrects arbitrarily warped historical documents, with an improved performance over a leading geometric correction method and the industry standard commercial system.

Input

Input

Output

Output

Publications

Availability

For information on availability and licencing, please contact PRImA Research.

OCR Post-correction and Enrichment