Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

289 results

Tools

Brevity

  • Description:Businesses and other organizations often deal with hundreds or even hundreds of thousands of documents. Knowing the content of these documents can be difficult. While you can discern the content of a graphical image at a glance with text documents you have to read through each to discern it's content. Reading through an entire document takes time - time you don't have to waste. The traditional solution to this problem has been to assign people to read the documents and write a brief abstract for each one. Unfortunately many organizations simply don't have the resources to assign people to summarize hundreds or even thousands of documents. Brevity provides you with a solution. Brevity easily generates document summaries for you. The summaries can be as long or as short as you wish. You can also use Brevity to highlight key sentences or words in your document.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Summerizer
  • License: Commercial
  • Language: 1
  • Developer: http://www.lextek.com/

CLAWS part-of-speech tagger for English

  • Description:-
  • Group: Text Processing
  • Type: -
  • Subtype: PoS tagger
  • License: http://ucrel.lancs.ac.uk/claws/purchase.html
  • Language: English
  • Developer: University Centre for Computer Corpus Research on Language

Calamari

  • Description:OCR Engine based on OCRopy and Kraken using python3. It is designed to both be easy to use from the command line but also be modular to be integrated and customized from other python scripts.
  • Group: text recognition
  • Type: ocr (text)
  • Subtype: ocr
  • License: Apache 2.0
  • Language:
  • Developer:

Carleton OCR

  • Description:Code repository for the Carleton OCR comps project 2010-2011
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: MIT
  • Language: -
  • Developer: -

Chaos

  • Description:CHAOS: A robust syntactic parser for Italian and for English. The system implements a modular and lexicalised approach to the syntactic parsing problem. It is based on the notion of eXtended Dependency Graph (XDG) that has been seen as a useful representation mechanism in a shallow parsing approach. The system offers a collection of modules for designing parsing architectures. The pool of modules consists of:
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: NER
  • License: Unclear
  • Language: Italian English
  • Developer: http://art.uniroma2.it/external/chaosproject/

Chaos - POS tagger

  • Description:CHAOS: A robust syntactic parser for Italian and for English. The system implements a modular and lexicalised approach to the syntactic parsing problem. It is based on the notion of eXtended Dependency Graph (XDG) that has been seen as a useful representation mechanism in a shallow parsing approach. The system offers a collection of modules for designing parsing architectures. The pool of modules consists of:
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: POS tagger
  • License: Unclear
  • Language: Italian English
  • Developer: http://art.uniroma2.it/external/chaosproject/

Chaos - Parser

  • Description:CHAOS: A robust syntactic parser for Italian and for English. The system implements a modular and lexicalised approach to the syntactic parsing problem. It is based on the notion of eXtended Dependency Graph (XDG) that has been seen as a useful representation mechanism in a shallow parsing approach. The system offers a collection of modules for designing parsing architectures. The pool of modules consists of:
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Parser
  • License: Unclear
  • Language: Italian English
  • Developer: http://art.uniroma2.it/external/chaosproject/

CiceroLite

  • Description:Language Computer's CiceroLite recognizes hundreds of different types of named entities in English Arabic and Chinese texts with nearly 90% precision and recall. It is available as one of many plug-in NLP components which operate within the Cicero On-Demand server.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: NER
  • License: Commercial
  • Language: 7
  • Developer: http://www.languagecomputer.com/

Citattest: attesting word forms in dictionary citations

  • Description:With this tool, occurrences of a headword of a historical dictionary like e.g. the Oxford English Dictionary are automatically marked in the quotations belonging to that headword in the dictionary.
  • Group: text processing
  • Type: nlp tools
  • Subtype: annotation tool
  • License:
  • Language: n/a
  • Developer: ivdnt

ClaraOCR

  • Description:Clara OCR is an Optical Character Recognition program. It features both a powerful GUI for the X Window System and a Web interface. The Web interface is able to collect revision efforts from the Internet using a simple revision model. It is intended to be used in the cooperative optical recognition of old books. It tries to facilitate fine- tuning so an optical recognition project is enabled to invest resources in tuning the OCR in order to achieve better recognition results for one specific book and reduce the overall revision cost.
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: GPL
  • Language: -
  • Developer: -

Color Target Quality Checker

  • Description:Fully automatic color target detection from digitized printed material and quality assurance
  • Group: Miscellaneous Utilities
  • Type: -
  • Subtype:
  • License: commercial
  • Language: -
  • Developer: Fraunhofer IAIS

Conjecture

  • Description:Conjecture is a modular extensible open-source C++ framework for Optical Character Recognition (OCR). Conjecture is not a single OCR but rather is an extensible collection of OCRs that can be explored analyzed compared extended modified and merged within a unified environment.
  • Group: Text Processing
  • Type: Core Text Recognition
  • Subtype: Framework
  • License: GPL
  • Language: -
  • Developer: -


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: