• Description: More than half a million representative text-based images compiled by a number of major European libraries. Covering texts from as early as 1500, and containing material from newspapers, books, pamphlets and typewritten notes, the dataset is an invaluable resource for future research into imaging technology, OCR and language enrichment. Ca. 50,000 GT files
  • Scope: Layout analysis OCR Postcorrection
  • License: CC - Attribution NonCommercial NoDerivatives or equivalent CC - Attribution NonCommercial ShareAlike or equivalent CC - Attribution ShareAlike or equivalent Public domain
  • Content type: Groundtruth Images Metadata
  • Size: ca. 500,000 images and ca.50,000 GT files
  • Language:
  • Owner: British Library, Bibliothèque nationale de France, Biblioteca Nacional de España, National Library of the Netherlands, Biblioteca virtual Miguel de Cervantes, Poznan Supercomputing and Networking Centre, National Library of Slovenia, Bavarian State Library, National Library of Bulgaria, National Library of Czech Republic
  • Contact: tech.support@digitisation.eu
  • Link: https://www.digitisation.eu/tools-resources/image-and-ground-truth-resources/