Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

283 results

Tools

TextEval

  • Description:Texteval is an alternative OCR evaluation tool from PRImA research.
  • Group: text processing
  • Type: language resources
  • Subtype: evaluation
  • License:
  • Language: n/a
  • Developer:

Textlab

  • Description:An innovative image and text mark-up tool TextLab is based on the protocols of fluid text editing of revision. Here "revision sites" are any areas of interest on a manuscript leaf or print page that indicates evidence of revision.
  • Group: Miscellaneous Utilities
  • Type: -
  • Subtype: Transcription
  • License: unknown
  • Language: -
  • Developer: HOFSTRA University

The Berkeley

  • Description:Parsing is the task of analyzing the grammatical structure of natural language. Given a sequence of words a parser forms units like subject verb object and determines the relations between these units according to some grammar formalism. Our work has focused on learning probabilistic context-free grammars (PCFGs) which assign a sequence of words the most likely parse tree. The parser supports a variety of languages and achieves state-of-the-art performance on most of them. For additional information and related projects visit the Berkeley NLP website.
  • Group: text processing
  • Type: nlp tools
  • Subtype: parser
  • License: GPL
  • Language:
  • Developer: http://nlp.cs.berkeley.edu/

The Oslo-Bergen Tagger

  • Description:The Oslo-Bergen tagger is a robust morphological and syntactic tagger developed at the University of Oslo and at Uni Computing in Bergen over several years. The tagger consists of three main modules: a preprocessor with multitagger and compound analyser a grammar module for morphological and syntactic disambiguation (Constraint Grammar) and a statistical module that removes the last of the remaining morphological ambiguity (only for Bokmål). The Constraint Grammar module uses a compiler developed at the University of Southern Denmark in Odense. The multitagger uses the lexicon Norsk ordbank.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Stemmer/Lemmatizer
  • License: GPL
  • Language: Bokmål and Nynorsk
  • Developer: http://tekstlab.uio.no/obt-ny/english/index.html

The Stanford Parser

  • Description:This package is a Java implementation of probabilistic natural language parsers both highly optimized PCFG and lexicalized dependency parsers and a lexicalized PCFG parser. The original version of this parser was mainly written by Dan Klein with support code and linguistic grammar development by Christopher Manning. Extensive additional work (internationalization and language-specific modeling flexible input/output grammar compaction lattice parsing k-best parsing typed dependencies output user support etc.) has been done by Roger Levy Christopher Manning Teg Grenager Galen Andrew Marie-Catherine de Marneffe Bill MacCartney Anna Rafferty Spence Green Huihsin Tseng Pi-Chuan Chang Wolfgang Maier and Jenny Finkel.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Parser
  • License: GPL
  • Language: 3
  • Developer: http://nlp.stanford.edu/index.shtml

TnT -- Statistical Part-of-Speech Tagging

  • Description:TnT the short form of Trigrams'n'Tags is a very efficient statistical part-of-speech tagger that is trainable on different languages and virtually any tagset. The component for parameter generation trains on tagged corpora. The system incorporates several methods of smoothing and of handling unknown words. TnT is not optimized for a particular language. Instead it is optimized for training on a large variety of corpora. Adapting the tagger to a new language new domain or new tagset is very easy. Additionally TnT is optimized for speed.The tagger is an implementation of the Viterbi algorithm for second order Markov models. The main paradigm used for smoothing is linear interpolation the respective weights are determined by deleted interpolation. Unknown words are handled by a suffix trie and successive abstraction.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: POS Tagger
  • License: Proprietary License
  • Language: -
  • Developer: http://www.coli.uni-saarland.de/~thorsten/

Transcript

  • Description:Transcript is a desktop-based manuscript transcription tool that supports word-processor style formatting.
  • Group: Text Recognition
  • Type: Postcorrection
  • Subtype: -
  • License: free or 15 EUR
  • Language: -
  • Developer: Jacob Boerema

Transitionbased dependency parser

  • Description:Bernd Bohnet and Joakim Nivre 2012 A TransitionBased System for Joint PartofSpeech Tagging and Labeled NonProjective Dependency Parsing EMNLPCoNLL pages 14551465 pdf bib
  • Group: text processing
  • Type: NLP Tools
  • Subtype: Parser
  • License:
  • Language: null
  • Developer: https://code.google.com/p/mate-tools/

Typereader

  • Description:TypeReader®has been in the global market and received hundreds of appraisals from various industry technology magazines since 1991. The heart of this award winning OCR software product ExperVision®’s OpenRTK® is the only OCR Engine which won UNLV Test for consecutive years. Commercial (server/desktop)
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: Own license
  • Language: -
  • Developer: -

Typewright

  • Description:TypeWright1 is a tool for correcting the text-version of a document made up of page images.
  • Group: Text Recognition
  • Type: Postcorrection
  • Subtype: -
  • License: ASL 2.0
  • Language: English
  • Developer: -

VARD 2

  • Description:VARD 2 is an interactive piece of software produced in Java designed to assist users of historical corpora in dealing with spelling variation particularly in EModE texts.
  • Group: Text Processing
  • Type: -
  • Subtype: spelling variations
  • License: Creative Commons Attribution-Noncommercial-Share Alike 2.0 UK: England & Wales License.
  • Language: Early Modern Englis but can be extended via plugins
  • Developer: Lancaster University

Virtual Transcription Laboratory

  • Description:Virtual Transcription Laboratory is Virtual Research Environment which works as a crowdsourcing platform for developing high quality textual representations of digital documents. It gives access to online OCR service and easy to use transcription editor. Images can be imported from various sources including direct import from digital libraries.
  • Group: Text Recognition
  • Type: Postcorrection
  • Subtype: -
  • License: free
  • Language: -
  • Developer: Poznań Supercomputing and Networking Center


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: