Succeed training materials
These materials have been produced in the frame of the Succeed project (www.succeed-project.eu).They are available in PDF format; standard SCORM packages to allow for their direct integration into educational platforms like moodle; and wiki to be fed up by the community.
They are available in PDF format and standard SCORM packages to allow for their direct integration into educational platforms like moodle.
Image processing

Document Deskewer
The Document Deskewer is a simple and easy to use command-line tool for automatically correcting skewed pages.
Image processing

GIMP
GIMP is best known as a free alternative to professional graphic editing programs such as Adobe Photoshop.
Image processing

Image Magick
ImageMagick is a software suite to create, edit, compose, or convert bitmap images. It can read and write images in over 100 formats.
Image processing
OCR – Training & Evaluation

Cutouts and page-generator
It is a web application that allows for the creation of custom recognition profile for a specific kind of documents in Tesseract.
OCR Training

Abbyy FineReader Engine 10
ABBYY FineReader is a widely used, well-documented commercial product for text recognition in images.
Text Recognition

OmniPage OCR
It is a robust optical character recognition (OCR) application available from Nuance Communications that supports over 120 of languages.
Text Recognition
Post-Correction

Virtual Transcription Laboratory
It is a crowdsourcing platform developed by PSNC to support creation of the searchable representation of historic textual documents.
Post-correction
Layout analysis

Newspaper Segmentation and Korrektor
It is a manual post-correction tool for automatically processed newspaper scans. It automatically corrects detected layout elements, texts and other properties.
Layout analysis
Named Entities (NE) Recognition & Resolution

DBPedia Spotlight
It is a tool for automatically annotating mentions of DBpedia resources in text, providing a solution for linking unstructured information sources.
NE Recognition & Resolution

Frog Named Entity Recognition
It is an integration of memory-based natural language processing (NLP) modules developed for Dutch.
NE Recognition & Resolution

NLTK (Natural Language Toolkit)
It is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources.
NE Recognition & Resolution

Stanford NER
Stanford NER is a tool that can mark and extract named entities (persons, locations, organizations or even titles) from a text file.
NE Recognition & Resolution
Miscellaneous tools
Other training opportunities
The IMPACT Centre of Competence aims to provide a wide range of training opportunities for institutions engaged in or embarking on a text-based digitisation project or programme.