After the lunch break, the DATeCH conference continued with its third session, chaired by Martin Reynaert (Tilburg University), where the Postcorrection of OCR was discussed.
John Evershed, from Project Computing described a system for automatic post OCR text correction of digital collections of historical texts (based on a “noisy channel” approach)
which avoids manual correction.
Watch the video:[cvm_video id=”2093″]
Günter Mühlberger, from Innsbruck University, introduced a new approach to the correction of noisy OCR text which combines the power of crowdsourcing with information retrieval technology. It provides a view of the word snippets of a specific search string and the possibility of validating each word snippet with a simple yes/no decision.
You can watch the video:[cvm_video id=”2072″]
Christoph Ringlstetter, from Gini GmbH, presented a new tool which visualizes possible OCR errors and series of similar possible OCR errors in a given input document, and allows therefore for the correction of multiple errors in one shot.
At the end of this session, Tecnilógica representatives introduced their company, followed by a coffee break at the companies exhibition hall.
Watch the video:[cvm_video id=”2124″]
Share this Post