Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

283 results

Tools

MapForce

  • Description:Altova MapForce® 2013 is an award-winning any-to-any graphical data mapping conversion and integration tool that maps data between any combination of XML database flat file EDI Excel XBRL and/or Web service then transforms data instantly or autogenerates royalty-free data integration code for the execution of recurrent conversions.
  • Group: Metadata Processing
  • Type: -
  • Subtype:
  • License: commercial
  • Language: -
  • Developer: Altova

Metadata Extraction Tool

  • Description:The Metadata Extraction Tool was developed by the National Library of New Zealand to programmatically extract preservation metadata from a range of file formats like PDF documents image files sound files Microsoft office documents and many others.
  • Group: Miscellaneous Utilities
  • Type: -
  • Subtype: metadata extraction
  • License: Apache License 2.0
  • Language: -
  • Developer: National Library of New Zealand

Minipar

  • Description:MINIPAR is a broad-coverage parser for the English language. An evaluation with the SUSANNE corpus shows that MINIPAR achieves about 88% precision and 80% recall with respect to dependency relationships. MINIPAR is very efficient on a Pentium II 300 with 128MB memory it parses about 300 words per second.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Parser
  • License: Unclear
  • Language: English
  • Developer: http://webdocs.cs.ualberta.ca/~lindek/minipar.htm

MontyChunker

  • Description:Lightning fast regular expression chunker
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Chunker
  • License: Free for non-commercial use
  • Language: 1
  • Developer: http://web.media.mit.edu/~hugo/montylingua/index.html

MontyLemmatiser

  • Description:Strips inflectional morphology i.e. changes verbs to infinitive form and nouns to singular form
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Stemmer/Lemmatizer
  • License: Free for non-commercial use
  • Language: -
  • Developer: http://web.media.mit.edu/~hugo/montylingua/index.html

MontyTagger

  • Description:Part-of-speech tagging based on Brill94 enriched with common sense
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: POS tagger
  • License: Free for non-commercial use
  • Language: 1
  • Developer: http://web.media.mit.edu/~hugo/montylingua/index.html

MontyTokenizer

  • Description:Tokenizes raw English text (sensitive to abbreviations) and resolve contractions e.g. "you're" ==> "you are"
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Tokenizer
  • License: Free for non-commercial use
  • Language: -
  • Developer: -

Morphette

  • Description:In Morfette lemmatization is cast as a classification task where a a lemmatization class corresponds to the specification of the edit operations which are needed to transform the inflected word form into the corresponding lemma The basic approach is described in Chrupala et al 2008 and Chrupala 2008 The current version of Morfette uses an averaged perceptron to fit the models rather than Maximum Entropy training The lemmatization classes are EditTreebased as described in Chrupala 2008
  • Group: text processing
  • Type: NLP Tools
  • Subtype: Stemmer/Lemmatizer
  • License:
  • Language: 0
  • Developer: https://sites.google.com/site/morfetteweb/

Morphette - Morphological Analysis

  • Description:Morfette is a tool for supervised learning of inflectional morphology. Given a corpus of sentences annotated with lemmas and morphological labels and optionally a lexicon morfette learns how to morphologically analyse new sentences.In the learning stage Morfette fits two separate logistic regression''models: one for morphological tagging and one for lemmatization. The predictions of the models are combined dynamically and produce a globally plausible sequence of morphological-tag - lemma pairs for ''a sentence.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Morphological Analysis
  • License: Unclear
  • Language: -
  • Developer: https://sites.google.com/site/morfetteweb/

Morphette - Stemmer/Lemmatizer

  • Description:In Morfette lemmatization is cast as a classification task where a a lemmatization class corresponds to the specification of the edit operations which are needed to transform the inflected word form into the corresponding lemma. The basic approach is described in (Chrupala et al 2008 and Chrupala 2008). The current version of Morfette uses an averaged perceptron to fit the models rather than Maximum Entropy training. The lemmatization classes are Edit-Tree-based as described in (Chrupala 2008).
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Stemmer/Lemmatizer
  • License: Unclear
  • Language: -
  • Developer: https://sites.google.com/site/morfetteweb/

NCSR binarisation and colour reduction

  • Description:Perform image binarisation using an algorithm developed at NCSR.
  • Group: image processing
  • Type: image processing and enhancement
  • Subtype: ner
  • License:
  • Language: xhosa
  • Developer: national center for scientific research (ncsr) \\\\\\\"demokritos\\\\\\\"

NCSR border detection and removal

  • Description:This tool detects and removes noisy black borders as well as noisy text regions. Moreover it detects the optimal page frames of double page document images.
  • Group: image processing
  • Type: image processing and enhancement
  • Subtype: image enhancement
  • License:
  • Language: xhosa
  • Developer: national center for scientific research (ncsr) \\\\\\\"demokritos\\\\\\\"


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: