Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

286 results

Tools

digilib

  • Description:Digilib is a web based client/server image viewing environment for the internet
  • Group: Miscellaneous Utilities
  • Type: -
  • Subtype: creating presentation version
  • License: GNU GPL
  • Language: -
  • Developer: Max-Planck-Institute for the History of Science the Bibliotheca Hertziana the University of Bern

document deskewer

  • Description:generic skew detection and correction (for the full range 0-360 degrees) for documents printed using Roman scripts
  • Group: image processing
  • Type: image processing and enhancement
  • Subtype: ner
  • License:
  • Language: n/a
  • Developer: fraunhofer iais
  • Wiki

fraunhofer iais mydec color binarize

  • Description:Color binarize separates letters from the background. Grayscale images are converted to binary. It can be calculated for the separation either for the entire image or for each pixel of the optimal contrast.
  • Group: image processing
  • Type: image processing and enhancement
  • Subtype: image enhancement
  • License:
  • Language: n/a
  • Developer:

fraunhofer iais mydec color binarize

  • Description:Color binarize separates letters from the background. Grayscale images are converted to binary. It can be calculated for the separation either for the entire image or for each pixel of the optimal contrast.
  • Group: image processing
  • Type: image processing and enhancement
  • Subtype: image enhancement
  • License:
  • Language: n/a
  • Developer: the université françois-rabelais in tours

fraunhofer newspaper segmenter

  • Description:The Korrektor is a manual post-correction tool for automatically processed newspaper scans. By loading the result XML files into the software, it is possible to correct automatically detected layout elements, texts and other properties. The scanned documents are displayed in two separate windows to allow for a detailed inspection. Results can be edited using context menus, drag and drop and keyboard shortcuts.
  • Group: layout analysis
  • Type: nlp tools
  • Subtype: 0
  • License:
  • Language: n/a
  • Developer:
  • Wiki

functional extension parser

  • Description:The Functional Extension Parser (FEP) is a Document Understanding Software tool capable of decoding layout elements of books. Based on the output of Optical Character Recognition layout elements such as page numbers running titles headings and footnotes are detected and annotated.
  • Group: layout analysis
  • Type: nlp tools
  • Subtype:
  • License:
  • Language: n/a
  • Developer: university of innsbruck

gamera ocr

  • Description:OCR toolkit for Gamera: This is a Gamera toolkit for building standard text recognition applications. It is based on the Gamera framework and requires a working Gamera installation.
  • Group: text recognition
  • Type: core text recognition
  • Subtype: framework
  • License:
  • Language: n/a
  • Developer: -

gimp

  • Description:GIMP is the GNU Image Manipulation Program. It is a freely distributed piece of software for such tasks as photo retouching image composition and image authoring.
  • Group: image processing
  • Type: image processing and enhancement
  • Subtype: ner
  • License:
  • Language: n/a
  • Developer: -

gocr

  • Description:GOCR is an OCR (Optical Character Recognition) program developed under the GNU Public License. It converts scanned images of text back to text files.
  • Group: text recognition
  • Type: core text recognition
  • Subtype: ner
  • License:
  • Language: n/a
  • Developer: -

hOCR

  • Description:HOCR is a Hebrew optical character recognition library.
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: GPLv3
  • Language: -
  • Developer: -

hOCR tools

  • Description:hOCR is a format for representing OCR output including layout information character confidences bounding boxes and style information. It embeds this information invisibly in standard HTML. By building on standard HTML it automatically inherits well-defined support for most scripts languages and common layout options. Furthermore unlike previous OCR formats the recognized text and OCR-related information co-exist in the same file and survives editing and manipulation. hOCR markup is independent of the presentation.
  • Group: Miscellaneous Utilities
  • Type: -
  • Subtype:
  • License: ASL 2.0
  • Language: -
  • Developer: -

imagemagick / graphicsmagick

  • Description:ImageMagick is a software suite to create edit compose or convert bitmap images. GraphicsMagick is the swiss army knife of image processing. It has been derived from ImageMagick 5.5.2
  • Group: image processing
  • Type: image processing and enhancement
  • Subtype: ner
  • License:
  • Language: n/a
  • Developer: imagemagick studio / graphicsmagick group


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: