Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

283 results

Tools

Carleton OCR

  • Description:Code repository for the Carleton OCR comps project 2010-2011
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: MIT
  • Language: -
  • Developer: -

Chaos

  • Description:CHAOS: A robust syntactic parser for Italian and for English. The system implements a modular and lexicalised approach to the syntactic parsing problem. It is based on the notion of eXtended Dependency Graph (XDG) that has been seen as a useful representation mechanism in a shallow parsing approach. The system offers a collection of modules for designing parsing architectures. The pool of modules consists of:
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: NER
  • License: Unclear
  • Language: Italian English
  • Developer: http://art.uniroma2.it/external/chaosproject/

Chaos - POS tagger

  • Description:CHAOS: A robust syntactic parser for Italian and for English. The system implements a modular and lexicalised approach to the syntactic parsing problem. It is based on the notion of eXtended Dependency Graph (XDG) that has been seen as a useful representation mechanism in a shallow parsing approach. The system offers a collection of modules for designing parsing architectures. The pool of modules consists of:
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: POS tagger
  • License: Unclear
  • Language: Italian English
  • Developer: http://art.uniroma2.it/external/chaosproject/

Chaos - Parser

  • Description:CHAOS: A robust syntactic parser for Italian and for English. The system implements a modular and lexicalised approach to the syntactic parsing problem. It is based on the notion of eXtended Dependency Graph (XDG) that has been seen as a useful representation mechanism in a shallow parsing approach. The system offers a collection of modules for designing parsing architectures. The pool of modules consists of:
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Parser
  • License: Unclear
  • Language: Italian English
  • Developer: http://art.uniroma2.it/external/chaosproject/

CiceroLite

  • Description:Language Computer's CiceroLite recognizes hundreds of different types of named entities in English Arabic and Chinese texts with nearly 90% precision and recall. It is available as one of many plug-in NLP components which operate within the Cicero On-Demand server.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: NER
  • License: Commercial
  • Language: 7
  • Developer: http://www.languagecomputer.com/

Citattest: attesting word forms in dictionary citations

  • Description:With this tool, occurrences of a headword of a historical dictionary like e.g. the Oxford English Dictionary are automatically marked in the quotations belonging to that headword in the dictionary.
  • Group: text processing
  • Type: nlp tools
  • Subtype: annotation tool
  • License:
  • Language: n/a
  • Developer: ivdnt

ClaraOCR

  • Description:Clara OCR is an Optical Character Recognition program. It features both a powerful GUI for the X Window System and a Web interface. The Web interface is able to collect revision efforts from the Internet using a simple revision model. It is intended to be used in the cooperative optical recognition of old books. It tries to facilitate fine- tuning so an optical recognition project is enabled to invest resources in tuning the OCR in order to achieve better recognition results for one specific book and reduce the overall revision cost.
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: GPL
  • Language: -
  • Developer: -

Color Target Quality Checker

  • Description:Fully automatic color target detection from digitized printed material and quality assurance
  • Group: Miscellaneous Utilities
  • Type: -
  • Subtype:
  • License: commercial
  • Language: -
  • Developer: Fraunhofer IAIS

Conjecture

  • Description:Conjecture is a modular extensible open-source C++ framework for Optical Character Recognition (OCR). Conjecture is not a single OCR but rather is an extensible collection of OCRs that can be explored analyzed compared extended modified and merged within a unified environment.
  • Group: Text Processing
  • Type: Core Text Recognition
  • Subtype: Framework
  • License: GPL
  • Language: -
  • Developer: -

Corpus Based Lexicon Tool (CoBaLT)

  • Description:Corpus Based Lexicon Tool (CoBaLT). A tool for corpus-based lexicon construction. Users can upload a text dataset (corpus) for use in creating an attestation-based lexicon. This tool is used to manually correct the automatically lemmatized corpus text. Verified lemmatized words plus the context in which they appear will be stored in the Information Retrieval Lexicon. The tool can handle plain text and various XML formats among which the IMPACT Page XML format and TEI. An important requirement of the tool is that it should be fit to quickly process large quantities of data that it is a web application that can be run from any computer in the local network that frequent input actions can be performed with the keyboard and that the information is presented in such a way that quick evaluation is possible.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Lexicon building
  • License: ASL 2.0
  • Language: -
  • Developer: mathieu.fannee@inl.nl

Cuelanguage

  • Description:cuelanguage is a small library of Java code and resources that provides the following basic naturallanguage processing capabilities
  • Group: text processing
  • Type: -
  • Subtype: NLP toolset and resources
  • License:
  • Language: null
  • Developer: Jonathan Feinberg

DBPedia spotlight

  • Description:DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text providing a solution for linking unstructured information sources to the Linked Open Data cloud through DBpedia. DBpedia Spotlight recognizes that names of concepts or entities have been mentioned (e.g. "Michael Jordan") and subsequently matches these names to unique identifiers (e.g. dbpedia:Michael_I._Jordan the machine learning professor or dbpedia:Michael_Jordan the basketball player). It can also be used for building your solution for Named Entity Recognition Keyphrase Extraction Tagging etc. amongst other information extraction tasks.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: NE linking
  • License: Free
  • Language: -
  • Developer: https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: