Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

283 results

Tools

Stanford Log-linear Part-Of-Speech Tagger

  • Description:A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token) such as noun verb adjective etc. although generally computational applications use more fine-grained POS tags like 'noun-plural'. This software is a Java implementation of the log-linear part-of-speech taggers described in these papers (if citing just one paper cite the 2003 one):
  • Group: Text processing
  • Type: NLP Tools
  • Subtype: POS tagger
  • License: GPL2
  • Language: Any
  • Developer: http://nlp.stanford.edu/index.shtml

Stanford coreNLP

  • Description:Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words their parts of speech whether they are names of companies people etc. normalize dates times and numeric quantities and mark up the structure of sentences in terms of phrases and word dependencies and indicate which noun phrases refer to the same entities.
  • Group: Text Processing
  • Type: -
  • Subtype: NLP toolset and resources
  • License: GPL v2
  • Language: English
  • Developer: The Stanford NLP Group

Synapse

  • Description:Competitive intelligence always concerns organizations people places products etc. This technology aims at tagging information in a text flow. The information automatically annotated is basically: Person's name functions organizations dates events places addresses phone numbers e-mail addresses and amounts.The technology is accurate for all types of texts whatever the field. Whether legal or military posts journalistic dispatches on terrorist acts or on economics news it identifies the actors their functions and relationships as well as details of the events encountered. User can integrate its own dictionaries in the technology.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: NER
  • License: Commercial
  • Language: 1?
  • Developer: http://www.quaero.org/

T-pen

  • Description:T‑PEN is a web-based tool for working with images of manuscripts. Users attach transcription data (new or uploaded) to the actual lines of the original manuscript in a simple flexible interface.
  • Group: Text Recognition
  • Type: -
  • Subtype:
  • License: ECL
  • Language: -
  • Developer: Saint Louis University

TXM

  • Description:It offers a comprehensive range of analysis tools (concordances collocate search frequency lists etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical functions (factorial analysis classification cooccurrency analysis etc.) based on R packages (http://www.r-project.org).
  • Group: Text Processing
  • Type: -
  • Subtype: text analysis tool
  • License: GNU General Public License version 3.0 (GPLv3)
  • Language: English French Russian
  • Developer: ENS DE LYON

TexLexAn

  • Description:TexLexAn is the project of an automatic text analyzer classifier and summarizer This software is at the frontier of the artificial intelligence and of the machine learning and participates at its very modest level to the development of the softwares of the future I take a lot of fun to develop it I hope you will enjoy to try it
  • Group: text processing
  • Type: NLP Tools
  • Subtype: Text Classification
  • License:
  • Language: 0
  • Developer: http://texlexan.sourceforge.net/

TexLexAn - Text Classification

  • Description:TexLexAn is the project of an automatic text analyzer classifier and summarizer. This software is at the frontier of the artificial intelligence and of the machine learning and participates at its very modest level to the development of the softwares of the future. I take a lot of fun to develop it I hope you will enjoy to try it.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Text Classification
  • License: Unclear
  • Language: English French German Italian Spanish
  • Developer: http://texlexan.sourceforge.net/

TexLexAn - summerizer

  • Description:TexLexAn is the project of an automatic text analyzer classifier and summarizer. This software is at the frontier of the artificial intelligence and of the machine learning and participates at its very modest level to the development of the softwares of the future. I take a lot of fun to develop it I hope you will enjoy to try it.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: summerizer
  • License: Unclear
  • Language: English French German Italian Spanish
  • Developer: http://texlexan.sourceforge.net/

Text and Error Profiler

  • Description:The Text and Error Profiler is software to analyse the OCR output from historical documents using statistical modelling of document characteristics to improve OCR accuracy. It works by attuning itself to a particular document rather than to common traits of printed documents from a certain era resulting in a highly adaptive process. The tool uses its document-specific knowledge to allow the batch processing of erroneous words.
  • Group: Text Recognition
  • Type: Postcorrection
  • Subtype: -
  • License: Licence pending. For further information please contact the IMPACT Centre of Competence
  • Language: Language-independent
  • Developer: Centrum für Informations und Sprachverarbeitung (CIS) University of Munich

TextCat

  • Description:TextCat is an implementation of the text categorization algorithm presented in Cavnar W B and J M Trenkle NGramBased Text Categorization In Proceedings of Third Annual Symposium on Document Analysis and Information Retrieval Las Vegas NV UNLV PublicationsReprographics pp 161175 1113 April 1994
  • Group: text processing
  • Type: NLP Tools
  • Subtype: Text Classification
  • License:
  • Language: 0
  • Developer: http://www.let.rug.nl/vannoord/

TextCat - Language Identification

  • Description:TextCat is an implementation of the text categorization algorithm presented in Cavnar W. B. and J. M. Trenkle ``N-Gram-Based Text Categorization'' In Proceedings of Third Annual Symposium on Document Analysis and Information Retrieval Las Vegas NV UNLV Publications/Reprographics pp. 161-175 11-13 April 1994.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Language Identification
  • License: free
  • Language: 69
  • Developer: http://www.let.rug.nl/vannoord/

TextCat - Text Classification

  • Description:TextCat is an implementation of the text categorization algorithm presented in Cavnar W. B. and J. M. Trenkle ``N-Gram-Based Text Categorization'' In Proceedings of Third Annual Symposium on Document Analysis and Information Retrieval Las Vegas NV UNLV Publications/Reprographics pp. 161-175 11-13 April 1994.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Text Classification
  • License: Free
  • Language: -
  • Developer: http://www.let.rug.nl/vannoord/


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: