Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

286 results

Tools

Freeling - NLP toolset and resources

  • Description:FreeLing is a library providing language analysis services oriented to satisfy the needs of Natural Language Processing. FreeLing is designed to be used as an external library from any application requiring this kind of services. Nevertheless a simple main program is also provided as a basic interface to the library which enables the user to analyze text files from the command line. Actually many users do not develop on FreeLing but use it as a text processing tool.
  • Group: Text processing
  • Type: NLP Tools
  • Subtype: NLP toolset and resources
  • License: GPL
  • Language: Any
  • Developer: http://www.talp.upc.edu/

Freeling - Tokenizer

  • Description:Tokenization rules are regular expressions that are matched against the beggining of the text line being processed. The first matching rule is used to extract the token the matching substring is deleted from the line and the process is repeated until the line is empty.
  • Group: Text processing
  • Type: NLP Tools
  • Subtype: Tokenizer
  • License: GPL
  • Language: -
  • Developer: http://www.talp.upc.edu/

Frog

  • Description:Frog's current version will tokenize tag lemmatize and morphologically segment word tokens in Dutch text files will assign a dependency graph to each sentence will identify the base phrase chunks in the sentence and will attempt to find and label all named entities.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: NER
  • License: GPL
  • Language: Dutch
  • Developer: ILK Research Group

FromThePage

  • Description:FromThePage is free software that allows volunteers to transcribe handwritten documents online
  • Group: text recognition
  • Type: Postcorrection
  • Subtype: Transcription
  • License:
  • Language: 0
  • Developer: Ben Brumfield

FromThePage - Miscellaneous Utilities

  • Description:FromThePage is an open-source tool that allows volunteers to collaborate to transcribe handwritten documents.
  • Group: Miscellaneous Utilities
  • Type: -
  • Subtype: Transcription
  • License: AGPL
  • Language: -
  • Developer: Ben Brumfield

FromThePage - Text Recognition

  • Description:FromThePage is free software that allows volunteers to transcribe handwritten documents on-line.
  • Group: Text Recognition
  • Type: Postcorrection
  • Subtype: Transcription
  • License: GNU AGPL v3
  • Language: -
  • Developer: Ben Brumfield

GATE

  • Description:open source software capable of solving almost any text processing problem
  • Group: Text Processing
  • Type: -
  • Subtype:
  • License: free
  • Language: -
  • Developer: Gate project

GEDI Ground Truthing Environment

  • Description:GEDI is a generic annotation tool that assists you in ground truthing scanned text documents. Its basic structure involves two types of files an Image file and a corresponding .xml file in GEDI Format
  • Group: Evaluation
  • Type: OCR (text)
  • Subtype: GT production
  • License: Own license
  • Language: -
  • Developer: -

GTText

  • Description:OCR free software and Ground Truthing tool for Color Images with Text: The gttext project helps to create fast and quality Ground Truthed data-sets from color text images.
  • Group: Evaluation
  • Type: OCR (text)
  • Subtype: GT production
  • License: GPLv2
  • Language: -
  • Developer: -

Gamera

  • Description:Gamera is not a packaged document recognition system but a toolkit for building document image recognition systems. It makes the development of new recognition system easy. Gamera is a cross platform library for the Python programming language.
  • Group: Miscellaneous Utilities
  • Type: -
  • Subtype:
  • License: GPL v2+
  • Language: -
  • Developer: -

Geometric Correction: Arbitrary Warping

  • Description:Software for correction of arbitrary local distortions in scans of historical documents
  • Group: Image Processing
  • Type: Image Processing and Enhancement
  • Subtype: -
  • License: commercial
  • Language: Not applicable
  • Developer: University of Salford (PRImA)

Graphbased dependency parser

  • Description:Bernd Bohnet 2010 Top Accuracy and Fast Dependency Parsing is not a Contradiction The 23rd International Conference on Computational Linguistics COLING 2010 Beijing China
  • Group: text processing
  • Type: NLP Tools
  • Subtype: Parser
  • License:
  • Language: null
  • Developer: https://code.google.com/p/mate-tools/


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: