The dataset produced by IMPACT is a landmark and an invaluable resource for the field of OCR and language technology related to historical documents. With over half a million images from the various European libraries in IMPACT and an unprecedented number of more than 50.000 ground truth files containing a high level of detail with full Unicode encoded text (including ligatures and special characters) and complete layout information (segmentation, region metadata and reading order), it will foster further research and development for years to come.
