Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

283 results

Tools

NCSR evaluation tool for ocr

  • Description:This tool evaluates the performance of an optical character recognition system on character and word level.
  • Group: evaluation
  • Type: ocr (text)
  • Subtype: ner
  • License:
  • Language: n/a
  • Developer: national center for scientific research (ncsr) \\\"demokritos\\\"

NCSR geometric correction: page curl

  • Description:This tool rectifies document images which suffer from warping and perspective distortions
  • Group: image processing
  • Type: image processing and enhancement
  • Subtype: ner
  • License:
  • Language: n/a
  • Developer: national center for scientific research (ncsr) \\\"demokritos\\\"
  • Wiki

NE Attestation tool

  • Description:This tool is meant to be used to for manual evaluation and correction of automatically matched occurrences of Named Entities in text material.This functionality is used to build Gold Standard Corpora of Named Entities.
  • Group: text processing
  • Type: nlp tools
  • Subtype: annotation tool
  • License:
  • Language: n/a
  • Developer:
  • Wiki

NLTK

  • Description:Tokenizers divide strings into lists of substrings For example tokenizers can be used to find the list of sentences or words in a string
  • Group: text processing
  • Type: NLP Tools
  • Subtype: Tokenizer
  • License:
  • Language: 0
  • Developer: nltk

NLTK - NER

  • Description:-
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: NER
  • License: Free
  • Language: Any
  • Developer: http://www.nltk.org/index.html

NLTK - NLP toolset and resources

  • Description:NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet along with a suite of text processing libraries for classification tokenization stemming tagging parsing and semantic reasoning.
  • Group: Text Processing
  • Type: -
  • Subtype: NLP toolset and resources
  • License: Apache License
  • Language: -
  • Developer: Dan Garrette Peter Ljunglöf Joel Nothman Mikhail Korobov Morten Minde Neergaard Steven Bird

NLTK - Tokenizer

  • Description:Tokenizers divide strings into lists of substrings. For example tokenizers can be used to find the list of sentences or words in a string.
  • Group: Text processing
  • Type: NLP Tools
  • Subtype: Tokenizer
  • License: Free
  • Language: -
  • Developer: -

NLTK Classify Package

  • Description:Classes and interfaces for labeling tokens with category labels (or “class labels”). Typically labels are represented with strings (such as 'health' or 'sports'). Classifiers can be used to perform a wide range of classification tasks. For example classifiers can be used...
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Topic Modelling
  • License: free open source
  • Language: -
  • Developer: Parsing:

NLTK Parsers

  • Description:Classes and interfaces for producing tree structures that represent the internal organization of a text. This task is known as “parsing” the text and the resulting tree structures are called the text’s “parses”. Typically the text is a single sentence and the tree structure represents the syntactic structure of the sentence. However parsers can also be used in other domains. For example parsers can be used to derive the morphological structure of the morphemes that make up a word or to derive the discourse structure for a set of utterances.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Parser
  • License: free open source
  • Language: -
  • Developer: Metrics:

NLTK Stemmers

  • Description:Interfaces used to remove morphological affixes from words leaving only the word stem. Stemming algorithms aim to remove those affixes required for eg. grammatical role tense derivational morphology leaving only the stem of the word. This is a difficult problem due to irregular words (eg. common verbs in English) complicated morphological rules and part-of-speech and sense ambiguities (eg. ceil- is not the stem of ceiling).
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Stemmer/Lemmatizer
  • License: free open source
  • Language: -
  • Developer: Python 3:

NLTK Taggers

  • Description:This package defines several taggers which take a token list (typically a sentence) assign a tag to each token and return the resulting list of tagged tokens. Most of the taggers are built automatically based on a training corpus.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: POS Tagger
  • License: free open source
  • Language: -
  • Developer: Integration:

NeuroOCR

  • Description:Demo neural network OCR
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: GPLv3
  • Language: -
  • Developer: -


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: