Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

283 results

Tools

LingPipe

  • Description:LingPipe's text classifiers learn by example. For each language being classified a sample of text is used as training data. LingPipe learns the distribution of characters per language using character language models. Character language models provide state-of-the-art accuracy for text classification. Character-level models are particularly well-suited to language ID because they do not require tokenized input; tokenizers are often language-specific.
  • Group: Text processing
  • Type: NLP Tools
  • Subtype: Language Identification
  • License: Free
  • Language: Any
  • Developer: http://alias-i.com/lingpipe/index.html

LingPipe - NER

  • Description:LingPipe is tool kit for processing text using computational linguistics. LingPipe is used to do tasks like: Find the names of people organizations or locations in news Automatically classify Twitter search results into categories Suggest correct spellings of queries
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: NER
  • License: Limited version free production version at a fee
  • Language: all in principle
  • Developer: http://alias-i.com/lingpipe/index.html

LingPipe - NLP toolset and resources

  • Description:LingPipe is tool kit for processing text using computational linguistics.
  • Group: Text processing
  • Type: NLP Tools
  • Subtype: NLP toolset and resources
  • License: Free/Commercial
  • Language: -
  • Developer: http://alias-i.com/lingpipe/index.html

LingPipe - Tokenizer

  • Description:Part-of-speech tagging is a process whereby tokens are sequentially labeled with syntactic labels such as "finite verb" or "gerund" or "subordinating conjunction". This tutorial shows how to train a part-of-speech tagger and compile its model to a file how to load a compiled model from a file and perform part-of-speech tagging and finally how to evaluate and tune models.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Tokenizer
  • License: unknown
  • Language: Any
  • Developer: http://alias-i.com/lingpipe/index.html

Link Grammar Parser

  • Description:The Link Grammar Parser is a syntactic parser of English based on link grammar an original theory of English syntax. Given a sentence the system assigns to it a syntactic structure which consists of a set of labeled links connecting pairs of words. The parser also produces a "constituent" representation of a sentence (showing noun phrases verb phrases etc.).
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Parser
  • License: GPL
  • Language: English
  • Developer: http://www.abisource.com/

Lios

  • Description:Lios is a free and open source software for converting print into text using either scanner or a camera. It can also produce text out of scanned images from other sources such as pdfs images or folders containing images.
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: GPLv3
  • Language: Bulgarian Croatian Czech Danish Dutch English Estonian French German Hungarian Italian Latvian Lithuanian Polish Portuguese Romanian Russian Russian-English bilingual Serbian Slovene Spanish Swedish Turkish and Ukrainian.
  • Developer: -

Longan

  • Description:A flexible pure-Java OCR implementation. The aim of this project is to write a reasonably (competent modular understandable) OCR system.
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: ASL 2.0
  • Language: -
  • Developer: -

MBT – Memory-Based Tagger-Generator

  • Description:MBT is a memory-based tagger-generator and tagger in one. The tagger-generator part can generate a sequence tagger on the basis of a training set of tagged sequences; the tagger part can tag new sequences. MBT can for instance be used to generate part-of-speech taggers or chunkers for natural language processing. It has also been used for named-entity recognition information extraction in domain-specific texts and disfluency chunking in transcribed speech.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: POS tagger
  • License: GNU3
  • Language: 2
  • Developer: http://ilk.uvt.nl/contact/

METS page turner

  • Description:Pure XSLT solution for the display of image files along with selected Descriptive Administrative and Structural metadata elements of a digital object serialized into an xml-encoded METS document. This application evolved from METSFramesSX.xsl incorporating a frames-based page turner with search functionality using XPATH.
  • Group: Miscellaneous Utilities
  • Type: -
  • Subtype: creating presentation version
  • License: unknown
  • Language: -
  • Developer: New York University Digital Library Technology Services

MILE ocr-performance-evaluator

  • Description:A desktop application used for performance evaluation of Optical Character Recognizers (OCR). Implemented using Eclipse SWT and runs on Windows & Linux.
  • Group: Evaluation
  • Type: OCR (text)
  • Subtype: evaluation
  • License: ASL 2.0
  • Language: -
  • Developer: -

MINT

  • Description:MINT services compose a web based platform that was designed and developed to facilitate aggregation initiatives for cultural heritage content and metadata in Europe. It is employed from the first steps of such workflows corresponding to the ingestion mapping and aggregation of metadata records and proceeds to implement a variety of remediation approaches for the resulting repository
  • Group: Metadata Processing
  • Type: -
  • Subtype:
  • License: unknown
  • Language: -
  • Developer: Image Video and Multimedia Systems Lab (IVML) of the ICCS

MaltParser

  • Description:MaltParser is a system for data-driven dependency parsing which can be used to induce a parsing model from treebank data and to parse new data using an induced model. MaltParser is developed by Johan Hall Jens Nilsson and Joakim Nivre at Växjö University and Uppsala University Sweden.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Parser
  • License: http://www.maltparser.org/license.html
  • Language: English French Swedish Spanish
  • Developer: http://www.maltparser.org/contact.html


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: