Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

283 results

Tools

korrektor

  • Description:GUI-based software for viewing and correcting document analysis results
  • Group: text recognition
  • Type: postcorrection
  • Subtype: ner
  • License:
  • Language: n/a
  • Developer: fraunhofer iais
  • Wiki

layout evaluation

  • Description:Performance evaluation tool for layout analysis and segmentation methods based on detailed metrics (types of errors such as merges splits missed regions etc.) and use scenarios
  • Group: evaluation
  • Type: layout
  • Subtype: ner
  • License:
  • Language: n/a
  • Developer: university of salford (prima)

line and word segmentation

  • Description:Segmentation of text regions into text lines and words independent of text recognition (OCR).
  • Group: image processing
  • Type: image segmentation
  • Subtype: ner
  • License:
  • Language: n/a
  • Developer: university of salford (prima)

mallet

  • Description:MALLET is a Java-based package for statistical natural language processing document classification clustering topic modeling information extraction and other machine learning applications to text.
  • Group: text processing
  • Type: nlp tools
  • Subtype: nlp toolset and resources
  • License:
  • Language: n/a
  • Developer: umasscs school of computer science

morphadorner

  • Description:MorphAdorner is a Java command-line program which acts as a pipeline manager for processes performing morphological adornment of words in a text. Language recognition lemmatizer lexicon lookup etc.
  • Group: Text Processing
  • Type: -
  • Subtype:
  • License: http://morphadorner.northwestern.edu/morphadorner/licenses/
  • Language: English
  • Developer: Northwestern University Information Technology

nert

  • Description:NERT is a tool that can mark and extract named entities (persons locations and organizations) from a text file. It uses a supervised learning technique which means it has to be trained with a manually tagged training file before it is applied to other text. In addition version 2.0 of the tool and higher also comes with a named entity matcher module with which it is possible to group variants or to assign modern word forms of named entities to old spelling variants. As a basis for the tool in this package the named entity re cognizer from Stanford University is used. This tool has been extended for use in IMPACT. Among the extensions is the aforementioned matcher module and a module that reduces spelling variation within the used data thus leading to improved performance.
  • Group: text processing
  • Type: nlp tools
  • Subtype: ner
  • License:
  • Language: n/a
  • Developer: info.inl.nl

ocrad

  • Description:GNU Ocrad is an OCR (Optical Character Recognition) program based on a feature extraction method. It reads images in pbm (bitmap) pgm (greyscale) or ppm (color) formats and produces text in byte (8-bit) or UTF-8 formats. Also includes a layout analyser able to separate the columns or blocks of text normally found on printed pages. Ocrad can be used as a stand-alone console application or as a backend to other programs.
  • Group: text recognition
  • Type: core text recognition
  • Subtype: ner
  • License:
  • Language: n/a
  • Developer: -

ocre

  • Description:Spanish OCR prototype
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: unknown
  • Language: English Euskara/Basque French German Polish Português Russian Spanish
  • Developer: -

ocrevalUAtion

  • Description:This OCR evaluation tool allows for the comparison of the reference text with the OCR output and also for the comparison of the output of two different OCR engines.
  • Group: text processing
  • Type: language resources
  • Subtype: evaluation
  • License:
  • Language: n/a
  • Developer: Rafa C. Carrasco
  • Wiki

ocropus

  • Description:OCRopus is an OCR system focusing on the use of large scale machine learning for addressing problems in document analysis
  • Group: text recognition
  • Type: core text recognition
  • Subtype: ner
  • License:
  • Language: n/a
  • Developer: ocropus project

open-jpeg

  • Description:The OpenJPEG library is an open-source JPEG 2000 codec written in C language. It has been developed in order to promote the use of JPEG 2000, the new still-image compression standard from the Joint Photographic Experts Group (JPEG). In addition to the basic codec, various other features are under development, among them the JP2 and MJ2 (Motion JPEG 2000) file formats, an indexing tool useful for the JPIP protocol, JPWL-tools for error-resilience, a Java-viewer for j2k-images, ...
  • Group: image processing
  • Type: image processing and enhancement
  • Subtype: image enhancement
  • License:
  • Language: n/a
  • Developer:

post correction tool

  • Description:Interactive post-correction of OCRed documents
  • Group: text recognition
  • Type: postcorrection
  • Subtype: ner
  • License:
  • Language: n/a
  • Developer: centrum für informations und sprachverarbeitung (cis) university of munich


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: