The 2-day CIS OCR Workshop on “OCR and postcorrection of early printings for digital humanities” originally held at LMU, Munich 14/15 September 2015 (see http://www.cis.lmu.de/ocrworkshop).
GT4HistOCR contains ground truth for research in Optical Character Recognition (OCR) technology applied to historical printings in German Fraktur and Early Modern Latin.
Example and evaluation dataset used for the ICDAR2013 Competition on Historical Book Recognition.
Example and evaluation dataset used for the ICDAR2017 Competition on Recognition of Early Indian printed Documents
This dataset contains contains scans of index cards from the UK’s Natural History Museum lepidoptera index
The corpus accounts for 22M OCRed characters along with the corresponding Gold Standard (GS).
More than half a million representative text-based images compiled by a number of major European libraries.