Demonstrator platform

The Impact Demonstrator platform allows for testing a number of tools online without installing any software locally. Users can access the platform with two different roles:

  • Registered users can test the tools with default parameters
  • Impact Centre of Competence members are able to choose their own configuration and use their own input files.

Below there is a list of tools classified according to their purpose:

Image Conversion

  • Graphics Magick
    Graphics magick provides a robust and efficient collection of tools and libraries which support reading, writing, and manipulating an image in over 88 major formats including important formats like DPX, GIF, JPEG, JPEG-2000, PNG, PDF, PNM, and TIFF. Learn more.
  • ImageMagick conversion to PGM
    Converts an image into Portable Graymap format (PGM) using Image Magick.
  • IMPACT OpenJPEG Conversion Service
    Perform conversion from JPEG2000 to TIFF image file format or vice versa. Implementation is based on the OpenJPEG library.
  • Kakadu
    Kakadu is a (commercial) software library for the encoding and decoding of images in JPEG2000 format.
  • Gimp Image Conversion
    GIMP is a raster graphics editor[5] used for image retouching and editing, free-form drawing, resizing, cropping, photo-montages, converting between different image formats, and more specialized tasks.
  • ImageMagick Conversion
  • Exiftool
    ExifTool is a free software program for reading, writing, and manipulating image, audio, and video metadata. It is platform independent, available as both a Perl library (Image::ExifTool) and command-line application. ExifTool is commonly incorporated into different types of digital workflows and supports many types of metadata including Exif, IPTC, XMP, JFIF, GeoTIFF, ICC Profile, Photoshop IRB, FlashPix, AFCP and ID3, as well as the manufacturer-specific metadata formats of many digital cameras.

Image Enhancement

  • Image Magick Border removal 
    Performs image enhancement by automatically detecting and removing black borders as well as noise regions from scanned document image files using Image Magick.
  • IMPACT NCSR Border Removal Service
    Perform image enhancement by automatically detecting and removing black borders as well as noise regions from scanned document image files. Learn more
  • IMPACT NCSR Geometric Correction Service
    Perform image enhancement by automatically correcting geometric distortions typically found in scanned document image files. Learn more.
  • Image Magick Deskewing
    Straightens an image to improve the detection of structures and text using Image Magick.
  • NCSR Binarisation Service
    Performs image binarisation using an algorithm developed at NCSR.
  • Fraunhofer IAIS mydec Deskewer
    mydec is software for automatic and manual media development for cultural and media organizations.It provides metadata from the cloud, making it possible to browse media content, to combine and distribute the web. The alignment of input images is corrected in order to improve the detection of structures and text. Learn more.
  • Fraunhofer IAIS mydec Color Binarize
    Color binarize separates letters from the background. Grayscale images are converted to binary. It can be calculated for the separation either for the entire image or for each pixel of the optimal contrast. Learn more.
  • Fraunhofer IAIS mydec Deshadow
    The natural aging of paper can cause the contrast ratio between paper and writing deteriorated. Such aging effects can be removed automatically to support the subsequent development. Fraunhofer IAIS mydec Deshadow removes this aging effects. Learn more.
  • Ocropus binarisation and dewarping service
    Performs the binarisation and dewarping processing using the Ocropus technology.
  • Unpaper
    unpaper is a post-processing tool for scanned sheets of paper, especially for book-pages scanned from previously created photocopies. unpaper tries to remove dark edges, corrects the rotation (“deskew”), and aligns the centering of pages.
  • ABBYY FineReader 11 Binarisation Service
    Performs the binarisation processing using the Abbyy FineReader11 technology.

Segmentation

  • IMPACT ABBYY FineReader 10 PAGE Segmentation Service
    Perform segmentation of an input image file using ABBYY FineReader 10 and export the results in PAGE format. Learn more.
  • Fraunhofer Newspaper Segmenter & Korrektor
    The Korrektor is a manual post-correction tool for automatically processed newspaper scans. By loading the result XML files into the software, it is possible to correct automatically detected layout elements, texts and other properties. The scanned documents are displayed in two separate windows to allow for a detailed inspection.
    Results can be edited using context menus, drag and drop and keyboard shortcuts.

OCR Training

  • Cutouts
    Cutouts supports preparation of the proper training material for the OCR system. As a proper training material we understand a set of shapes (areas) separatedfrom the source document composing a font used for a print of a given document. Learn more.

OCR Engines

  • Abbyy FineReader 11 SDK version
    ABBYY FineReader is an optical character recognition (OCR) software that provides unmatched text recognition accuracy and conversion capabilities, virtually eliminating retyping and reformatting of documents. Intuitive use and one-click automated tasks let you do more in fewer steps. Up to 190 languages are supported for text recognition — more than any other OCR software in this market.
  • Abbyy FineReader 11 with Impact User dictionaries
    Performs the OCR recognition using the Abbyy Fine Reader Technology with the external dictionaries developed during the Impact project.
  • IMPACT Tesseract 3.03 OCR Service
    Perform OCR on an input image file using Tesseract 3.03 technology.
  • Tesseract PAGE XML output v1.3
    Performs OCR on an input image file using Tesseract 3.03 technology and exports the output into PAGE XML format. More info.
  • Gocr
    GOCR is an OCR (Optical Character Recognition) program, developed under the GNU Public License. It converts scanned images of text back to text files.
  • OCRad
    GNU Ocrad is an OCR (Optical Character Recognition) program based on a feature extraction method. It reads images in pbm (bitmap), pgm (greyscale) or ppm (color) formats and produces text in byte (8-bit) or UTF-8 formats.
  • Gamera OCR Module
    Gamera is a Python framework for building document analysis applications.
  • Cuneiform
    CuneiForm is a software tool for optical character recognition. It was originally developed at Cognitive Technologies and, after a few years with no development, released as freeware on December 12, 2007. The kernel of the OCR engine was released under the open source BSD license license at the beginning of April 2008.

Evaluation

File Type and Encoding

Storage Estimator

  • IMPACT Storage Estimator
    This tool will estimate the storage required for the images and OCR output files made within your digitisation workflow.

Cost Estimator

Other

  • IMPACT INL Named Entities Recognition Service
    Perform recognition and tagging of named entities (persons, locations and organizations) in a text file. Learn more.
  • INL Lemmatizer
    INL-developed tagger-lemmatizer for historical Dutch, where the tagger is trained on the “Letters as loot” corpus and the lemmatizer
    is based on the INL historical lexicon. Learn more.
  • Decompress
    This tool decompress a compressed file. It is useful in the execution of workflows.
  • JHOVE2
    The JHOVE2 project generalizes the concept of format characterization to include identification, validation, feature extraction, and policy-based assessment. Learn more.
  • Stanford NER
    Stanford NER is a tool that can mark and extract named entities (persons, locations, organizations or even titles) from a text file. It uses a supervised learning technique, which means it has to be trained with a manually tagged training file before it is applied to other text. Learn more.
  • Mallet
    MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
  • Taverna workflow Management System  taverna
    Execute workflows created with the Taverna Workflow Management System.  

succeed_black_greenAcknowledgement: This platform was created by IMPACT project and further developed by Succeed (a project funded by European Union under FP7-ICT).

impact