Succeed training materials


These materials have been produced in the frame of the Succeed project (www.succeed-project.eu).They are available in PDF format; standard SCORM packages to allow for their direct integration into educational platforms like moodle; and wiki to be fed up by the community.

They are available in PDF format and standard SCORM packages to allow for their direct integration into educational platforms like moodle.


Succeed project

Image processing

Document Deskewer

Document Deskewer

The Document Deskewer is a simple and easy to use command-line tool for automatically correcting skewed pages.

Image processing

Gimp

GIMP

GIMP is best known as a free alternative to professional graphic editing programs such as Adobe Photoshop.

Image processing

Image Magick

Image Magick

ImageMagick is a software suite to create, edit, compose, or convert bitmap images. It can read and write images in over 100 formats.

Image processing

Page Curl Correction

Page curl correction

The Page Curl Correction is a command-line tool capable of detecting such distortions and correcting them automatically.

Image processing

Scan Tailor

Scan Tailor

It is an interactive tool for post-processing of scanned pages that gives the ability to cut or crop pages, compensate for skew angle, etc.

Image processing

 
 
 

OCR – Training & Evaluation

Cutouts and page-generator

Cutouts and page-generator

It is a web application that allows for the creation of custom recognition profile for a specific kind of documents in Tesseract.

OCR Training

PDFWIKI
Abbyy FineReader Engine 10

Abbyy FineReader Engine 10

ABBYY FineReader is a widely used, well-documented commercial product for text recognition in images.

Text Recognition

PDFWIKISCORM
OmniPage OCR

OmniPage OCR

It is a robust optical character recognition (OCR) application available from Nuance Communications that supports over 120 of languages.

Text Recognition

PDFWIKISCORM
Page Curl Correction

Tesseract 3.02

It is the most widely used open source OCR application. It supports typed, handwritten or printed text and a wide variety of languages.

Text Recognition

PDFWIKISCORM
Cutouts and page-generator

ocrevalUAtion

It compares a reference text with the OCR output and the comparison of the output of two different OCR engines or one engine with different options.

OCR evaluation

SiteWIKISCORM
 
 
 

Post-Correction

Cutouts and page-generator

Virtual Transcription Laboratory

It is a crowdsourcing platform developed by PSNC to support creation of the searchable representation of historic textual documents.

Post-correction

PDFWIKI

Layout analysis

Korrektor

Newspaper Segmentation and Korrektor

It is a manual post-correction tool for automatically processed newspaper scans. It automatically corrects detected layout elements, texts and other properties.

Layout analysis

PDFWIKISCORM

Text processing

Image Magick

AlchemyAPI

It provides an online service for keyword extraction, but also for sentiment extraction, text categorization, etc.

Text processing

PDFWIKISCORM
CoBaLT

CoBaLT

It is an application in which a corpus of texts can be loaded so as to be able to annotate its tokens (lemmatize and more).

Text processing

PDFWIKISCORM

Named Entities (NE) Recognition & Resolution

DBPedia Spotlight

DBPedia Spotlight

It is a tool for automatically annotating mentions of DBpedia resources in text, providing a solution for linking unstructured information sources.

NE Recognition & Resolution

PDFWIKISCORM
Frog NE Recognition

Frog Named Entity Recognition

It is an integration of memory-based natural language processing (NLP) modules developed for Dutch.

NE Recognition & Resolution

PDFWIKI
NE Attestation tool

NE Attestation tool

It is a multi purpose GUI used in the production of computational lexica and gold standard data for NE tagging.

NE Recognition & Resolution

PDFWIKISCORM
NERT

NERT (Named Entities Recognition Tool)

It is a tool that can mark and extract named entities (persons, locations, organizations or even titles) from a text file.

NE Recognition & Resolution

PDFWIKISCORM
NLTK

NLTK (Natural Language Toolkit)

It is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources.

NE Recognition & Resolution

PDFWIKISCORM
Stanford NER

Stanford NER

Stanford NER is a tool that can mark and extract named entities (persons, locations, organizations or even titles) from a text file.

NE Recognition & Resolution

PDFWIKISCORM
 
 

Miscellaneous tools

Document Deskewer

JHOVE2

JHOVE2 is open source software for format-aware characterization of digital objects.

Miscellaneous tools

PDFWIKISCORM
Lexicon Service

INL Lexicon service

It is a webservice that gives any piece of software quick online access to a lexicon by means of http requests.

Miscellaneous tools

PDFWIKISCORM
 
 

Other training opportunities

The IMPACT Centre of Competence aims to provide a wide range of training opportunities for institutions engaged in or embarking on a text-based digitisation project or programme.