Image and Ground Truth Resources
A carefully selected subset of these images has been reproduced with accompanying “ground truth”. In digital imaging and OCR, ground truth is the objective verification of the particular properties of a digital image, used to test the accuracy of automated image analysis processes. The ground truth of an image’s text content, for instance, is the complete and accurate record of every character and word in the image. This can be compared to the output of an OCR engine and used to assess the engine’s accuracy, and how important any deviation from ground truth is in that instance.
The ground truth provided by the Impact Centre of Competence is stored and exchanged via xml instances in the Page Analysis and Ground-truth Elements (PAGE) format, which was developed by the University of Salford, and which is maintained at: http://schema.primaresearch.org/PAGE. A paper explaining the development of PAGE was delivered by the University of Salford at ICPR2010 and is available here.
The Impact dataset is mainly distributed under attribution, non-commercial, share alike license, but please check every dataset for more information about its licensing schema.
A copy of this dataset with further browsing features can be also found at https://www.primaresearch.org/datasets
An introduction to Ground Truth Production with the IMPACT Project, by Ines Jerele from the NUK – National and University Library of Slovenia. Also includes some background to Ground Truth