Demonstrator platform


Acknowledgement: This platform was created by IMPACT project and further developed by Succeed (a project funded by European Union under FP7-ICT).

The Impact Centre of Competence Demonstrator platform allows users to test a number of tools online without installing any software on their computers. These tools cover all steps in the digitisation workflow such as image conversion, image enhancement, ocr and evaluation tools.

Member

Impact Centre of Competence members are able to choose their own configuration and use their own input files. Login or Register.

Registered user

Registered users can test the tools with default parameters. Login or Register.

Image Conversion


GraphicsMagick

Graphics magick provides a robust and efficient collection of tools and libraries which support reading, writing, and manipulating an image in over 88 major formats, including important formats like DPX, GIF, JPEG, JPEG-2000, PNG, PDF, PNM, and TIFF.
Source: Project GraphicsMagick.

Image Enhancement


Ocropus Binarisation and Dewarping service

Ocropus binarisation and dewarping servicePerforms the binarisation and dewarping processing using the Ocropus technology.

Segmentation


Fraunhofer Newspaper Segmenter & Korrektor

The Korrektor is a manual post-correction tool for automatically processed newspaper scans. By loading the resulting XML files into the software, it is possible to correct automatically detected layout elements, texts and other properties.

OCR Engines


Tesseract 3.03 OCR Service

Perform OCR on an input image file using Tesseract 3.03 technology.

Evaluation


IMPACT INL OCR Evaluation Service

Performs OCR evaluation by comparing the results with ground truth.

 

Tools classified according to their purpose

Image conversion

  • Graphics Magick
    Graphics magick provides a robust and efficient collection of tools and libraries which support reading, writing, and manipulating an image in over 88 major formats including important formats like DPX, GIF, JPEG, JPEG-2000, PNG, PDF, PNM, and TIFF. Learn more.
  • ImageMagick conversion to PGM
    Converts an image into Portable Graymap format (PGM) using Image Magick.
  • IMPACT OpenJPEG Conversion Service
    Perform conversion from JPEG2000 to TIFF, BMP, RAW, etc. image file formats. Implementation is based on the OpenJPEG library.
  • Kakadu
    Kakadu is a (commercial) software library for the encoding and decoding of images in JPEG2000 format.
  • Exiftool
    ExifTool is a free software program for reading, writing, and manipulating image, audio, and video metadata. It is platform independent, available as both a Perl library (Image::ExifTool) and command-line application. ExifTool is commonly incorporated into different types of digital workflows and supports many types of metadata including Exif, IPTC, XMP, JFIF, GeoTIFF, ICC Profile, Photoshop IRB, FlashPix, AFCP and ID3, as well as the manufacturer-specific metadata formats of many digital cameras.

Image Enhancement

  • Image Magick Border removal
    Performs image enhancement by automatically detecting and removing black borders as well as noise regions from scanned document image files using Image Magick.
  • Galfar’s Lair Deskew
    Straightens an image to improve the detection of structures.
  • Fraunhofer IAIS mydec Color Binarize
    Color binarize separates letters from the background. Grayscale images are converted to binary. It can be calculated for the separation either for the entire image or for each pixel of the optimal contrast. Learn more.
  • Fraunhofer IAIS mydec Deshadow
    The natural aging of paper can cause the contrast ratio between paper and writing deteriorated. Such aging effects can be removed automatically to support the subsequent development. Fraunhofer IAIS mydec Deshadow removes this aging effects. Learn more.
  • Ocropus binarisation and dewarping service
    Performs the binarisation (converts pixels into black and white) and dewarping (perspective correction) processing using the Ocropus technology.
  • Unpaper
    unpaper is a post-processing tool for scanned sheets of paper, especially for book-pages scanned from previously created photocopies. unpaper tries to remove dark edges, corrects the rotation (“deskew”), and aligns the centering of pages.
  • ABBYY FineReader 11 Binarisation Service
    Performs the binarisation processing using the Abbyy FineReader11 technology.
  • Scan Tailor
    Scan Tailor is an interactive post-processing tool for scanned pages. It performs operations such as page splitting, deskewing, adding/removing borders, and others. You give it raw scans, and you get pages ready to be printed or assembled into a PDF or DJVU file. Learn more.

Segmentation

  • IMPACT ABBYY FineReader 10 PAGE Segmentation Service
    Perform segmentation of an input image file using ABBYY FineReader 10 and export the results in PAGE format. Learn more.
  • Fraunhofer Newspaper Segmenter & Korrektor
    The Korrektor is a manual post-correction tool for automatically processed newspaper scans. By loading the result XML files into the software, it is possible to correct automatically detected layout elements, texts and other properties. The scanned documents are displayed in two separate windows to allow for a detailed inspection.
    Results can be edited using context menus, drag and drop and keyboard shortcuts.

OCR

OCR Training

  • Cutouts
    Cutouts supports preparation of the proper training material for the OCR system. As a proper training material we understand a set of shapes (areas) separatedfrom the source document composing a font used for a print of a given document. Learn more.

OCR Engines

  • Abbyy FineReader 11 SDK version
    ABBYY FineReader is an optical character recognition (OCR) software that provides unmatched text recognition accuracy and conversion capabilities, virtually eliminating retyping and reformatting of documents. Intuitive use and one-click automated tasks let you do more in fewer steps. Up to 190 languages are supported for text recognition — more than any other OCR software in this market.
  • Abbyy FineReader 11 with Impact User dictionaries
    Performs the OCR recognition using the Abbyy Fine Reader Technology with the external dictionaries developed during the Impact project.
  • IMPACT Tesseract 3.03 OCR Service
    Performs OCR on an input image file using Tesseract 3.03 technology.
  • Tesseract PAGE XML output v1.3
    Performs OCR on an input image file using Tesseract 3.03 technology and exports the output into PAGE XML format. More info.
  • IMPACT Tesseract 4.0 OCR Service
    Performs OCR on an input image file using Tesseract 4.0 technology.
  • Gocr
    GOCR is an OCR (Optical Character Recognition) program, developed under the GNU Public License. It converts scanned images of text back to text files.
  • OCRad
    GNU Ocrad is an OCR (Optical Character Recognition) program based on a feature extraction method. It reads images in pbm (bitmap), pgm (greyscale) or ppm (color) formats and produces text in byte (8-bit) or UTF-8 formats.
  • Gamera OCR Module
    Gamera is a Python framework for building document analysis applications.
  • Cuneiform
    CuneiForm is a software tool for optical character recognition. It was originally developed at Cognitive Technologies and, after a few years with no development, released as freeware on December 12, 2007. The kernel of the OCR engine was released under the open source BSD license license at the beginning of April 2008.

Evaluation

File Type and Encoding

Other

Storage Estimator

  • IMPACT Storage Estimator
    This tool will estimate the storage required for the images and OCR output files made within your digitisation workflow.

Cost Estimator

Other

  • IMPACT INL Named Entities Recognition Service
    Perform recognition and tagging of named entities (persons, locations and organizations) in a text file. Learn more.
  • INL Lemmatizer
    INL-developed tagger-lemmatizer for historical Dutch, where the tagger is trained on the “Letters as loot” corpus and the lemmatizer
    is based on the INL historical lexicon. Learn more.
  • Decompress
    This tool decompress a compressed file. It is useful in the execution of workflows.
  • JHOVE2
    The JHOVE2 project generalizes the concept of format characterization to include identification, validation, feature extraction, and policy-based assessment. Learn more.
  • Stanford NER
    Stanford NER is a tool that can mark and extract named entities (persons, locations, organizations or even titles) from a text file. It uses a supervised learning technique, which means it has to be trained with a manually tagged training file before it is applied to other text. Learn more.
  • Mallet
    MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
  • Catlinux
  • BVC Geonames Disambiguation
    Disambiguation tool for geographic locations using external repositories such as Wikidata and Geonames. Available with MIT License.