The 2-day CIS OCR Workshop on “OCR and postcorrection of early printings for digital humanities” originally held at LMU, Munich 14/15 September 2015 (see http://www.cis.lmu.de/ocrworkshop).
GT4HistOCR: Ground Truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin
GT4HistOCR contains ground truth for research in Optical Character Recognition (OCR) technology applied to historical printings in German Fraktur and Early Modern Latin.
Abraham, Belgian Newspaper Catalogue
Abraham. Belgian newspaper catalogue is the catalogue of Belgian newspapers published since 1800.
Dataset of ICDAR 2019 Competition on Post-OCR Text Correction
The corpus accounts for 22M OCRed characters along with the corresponding Gold Standard (GS).
IMPACT
More than half a million representative text-based images compiled by a number of major European libraries.
ASV ToolBox
ASV Toolbox is a modular collection of tools for the exploration of written language data.