ICDAR2019 Competition on Post-OCR Text Correction (POCR) invites researchers from any field that can be applied to document analysis (e.g. natural language processing, data analysis, text data mining...) to challenge their method(s) for improving/denoising OCR-ed texts, on a testbed of more than 20 million characters. Given the noisy OCR of printed text from 10 languages (English, French, German, Finish, Spanish, Dutch, Czech, Bulgarian, Slovak and Polish), the participants will be proposed to participate in two tasks: detecting and/or correcting OCR errors.! The text element is intended for longform copy that could potentially include multiple paragraphs.
How to participate
All participants are invited to:
- Register on the website
- Train method(s) on the training dataset
- Test method(s) on the testing dataset
- Submit results and method descriptions
Track of challenges
- Detection of OCR errors: given the raw OCR-ed text, the participants are asked to provide the position and the
length of the suspected errors.
- Correction of OCR errors: given the OCR errors in their context, the participants are asked to provide one or a ranked list of candidates for correction.
- 1st February 2019: Registration
- mid-February March 2019:
Training set sent to participants
- 30 March 2019: Registration
- 24 Apr. 2019: Testing set sent to participants
- 26 Apr. 2019: Result submission to the organizers
- 21 Sept.: Results notification
For further information, please visit https://sites.google.com/view/icdar2019-postcorrectionocr/home?authuser=0
Share this Post