Parallel Session 1: Research was dedicated to presentations and discussions around the state of the art research tools for document analysis developed via the IMPACT Project.
As you might guess from the slides below, the information packed into these presentations could fill a whole new two-day conference! But for now, a brief summary will have to suffice and I will implore you to visit the tools section of the freshly launched Impact: Centre of Competence Website for more details.
A video of the session is available here:
Impact Tools Developed by NCSR (Basilis Gatos)
The folks at the Computational Intelligence Laboratory over at the National Centre of Scientific Research (DEMOKRITOS) in Athens focus their activity around “research & development methods, techniques and prototypes in the areas of telecommunication systems, networks, and informatics”. Involved with IMPACT since 2008 they have partnered in the production of nine software tools to support binarisation, border removal, page split, page curl correction, OCR result, character segment, word spotting.
OCR for Typewritten Documents (Stefan Pletschacher)
Stefan explained that typewritten documents from roughly 1870-1970’s pose a unique challenge to OCR recognition. He points out that each character is actually produced on the page independently of the rest and they can appear with different weights do the mechanical nature of the process, even within the same word. Typical typewritten documents in archives are actually carbon copies with blurred type and a textured background, and administrative documents at that, rife with names, abbreviations, numbers, which render typical lexicon based recognition approaches less useful. A system was developed in IMPACT to tackle these unique issues by incorporating background knowledge of typewritten documents, and through improved segmentatio and enhancement of glyph images, while “performing language independent character recognition using specifically trained classifiers”.
Image Enhancement, Segmentation and Experimental OCR (A. Antonacopoulos)
Representing the work of PRImA, Pattern Recognition & Image Analysis Research at the University of Salford Apostolos demonstrated their approach to the digitisation workflow and the tools developed for Image Enhancement (border removal, page curl removal, correction of arbitrary warping) as well as segmentation (recognition-based and stand alone).