Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

283 results

Tools

Apache OpenNLP

  • Description:The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text
  • Group: text processing
  • Type: NLP Tools
  • Subtype: NLP toolset and resources
  • License:
  • Language: 0
  • Developer: The Apache Software Foundation

Apache OpenNLP - NER

  • Description:The Name Finder can detect named entities and numbers in text. To be able to detect entities the Name Finder needs a model. The model is dependent on the language and entity type it was trained for. The OpenNLP projects offers a number of pre-trained name finder models which are trained on various freely available corpora. They can be downloaded at our model download page. To find names in raw text the text must be segmented into tokens and sentences. A detailed description is given in the sentence detector and tokenizer tutorial. Its important that the tokenization for the training data and the input text is identical.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: NER
  • License: Apache License 2
  • Language: Any
  • Developer: http://opennlp.apache.org/

Apache OpenNLP - NLP toolset and resources

  • Description:The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: NLP toolset and resources
  • License: Apache License v2
  • Language: -
  • Developer: The Apache Software Foundation

Apache OpenNLP - POS Tagger

  • Description:The Part of Speech Tagger marks tokens with their corresponding word type based on the token itself and the context of the token. A token might have multiple pos tags depending on the token and the context. The OpenNLP POS Tagger uses a probability model to predict the correct pos tag out of the tag set. To limit the possible tags for a token a tag dictionary can be used which increases the tagging and runtime performance of the tagger.
  • Group: Text processing
  • Type: NLP Tools
  • Subtype: POS Tagger
  • License: Apache License 2
  • Language: -
  • Developer: http://opennlp.apache.org/

Apache OpenNLP - Tokenizer

  • Description:The OpenNLP Tokenizers segment an input character sequence into tokens. Tokens are usually words punctuation numbers etc.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Tokenizer
  • License: Apache License 2
  • Language: -
  • Developer: http://opennlp.apache.org/

Apache Stanbol

  • Description:Apache Stanbol provides a set of reusable components for semantic content management
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: NE linking
  • License: Apache License v2
  • Language: -
  • Developer: The Apache Software Foundation

Asprise

  • Description:Asprise OCR SDK library for Java enables you to equip your Java applications (Java applets web applications standard applications J2EE enterprise applications) with optical character recognition (OCR) ability.
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: Own license
  • Language: -
  • Developer: -

Augmented SIP Creator (ASC)

  • Description:The ASC uses XSL scripts to transform Metadata from a source to a target XML format. It can be used to normalize and validate input metadata from heterogenous sources.
  • Group: Metadata Processing
  • Type: -
  • Subtype:
  • License: commercial
  • Language: -
  • Developer: Fraunhofer IAIS

BIT-Alpha

  • Description:Small French company that offered trainable OCR based on Neuronal Networks with support for Fraktur.
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: Commercial
  • Language: German French
  • Developer: -

BlackLight

  • Description:Blacklight is an open source Ruby on Rails gem that provides a discovery interface for any Solr index.
  • Group: Miscellaneous Utilities
  • Type: -
  • Subtype: discovery interface
  • License: Creative Commons Attribution-Share Alike 3.0 United States License.
  • Language: -
  • Developer: University of Virginia Stanford University Johns Hopkins University and WGBH

Brevity

  • Description:Businesses and other organizations often deal with hundreds or even hundreds of thousands of documents. Knowing the content of these documents can be difficult. While you can discern the content of a graphical image at a glance with text documents you have to read through each to discern it's content. Reading through an entire document takes time - time you don't have to waste. The traditional solution to this problem has been to assign people to read the documents and write a brief abstract for each one. Unfortunately many organizations simply don't have the resources to assign people to summarize hundreds or even thousands of documents. Brevity provides you with a solution. Brevity easily generates document summaries for you. The summaries can be as long or as short as you wish. You can also use Brevity to highlight key sentences or words in your document.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Summerizer
  • License: Commercial
  • Language: 1
  • Developer: http://www.lextek.com/

CLAWS part-of-speech tagger for English

  • Description:-
  • Group: Text Processing
  • Type: -
  • Subtype: PoS tagger
  • License: http://ucrel.lancs.ac.uk/claws/purchase.html
  • Language: English
  • Developer: University Centre for Computer Corpus Research on Language


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: