Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

283 results

Tools

File-Analyzer

  • Description:The application allows a user to analyze the contents of a file system or external drive and generates statistics about the contents of the contained directories.
  • Group: Miscellaneous Utilities
  • Type: -
  • Subtype:
  • License: unknown
  • Language: -
  • Developer: U.S. National Archives

Franken+

  • Description:The Initiative for Digital Humanities Media and Culture (IDHMC) at Texas A&M University as part of its Early Modern OCR Project (eMOP) has created a new tool called Franken+ that provides a way to create font training for the Tesseract OCR engine using page images. This is in contrast to Tesseract's document method of font training which involves using a word processing program with a modern font. ''''Franken+ works in conjunction with PRImA's Aletheia tool and allows users to easily and quickly identify one or more idealized forms of each glyph found on a set of page images. These identified forms are then used to generate a set of Franken-page images matching the page characteristics documented in Tesseract's training instructions but using a font used in an actual early modern printed document.
  • Group: Text Recognition
  • Type: Training
  • Subtype: -
  • License: Open source
  • Language: -
  • Developer: Bryan Tarpley

Fraunhofer IAIS mydec Deshadow

  • Description:This tool reduces noise caused by transparencies in the physical document
  • Group: image processing
  • Type: image processing and enhancement
  • Subtype: image enhancement
  • License:
  • Language: n/a
  • Developer:

FreeLing - Language Identification

  • Description: It compares the given text with available models for different languages and returns the most likely language the text is written in. It can be used as a preprocess to determine which data files are to be used to analyze the text.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Language Identification
  • License: GPL
  • Language: Asturian Catalan English Galician Italian Portuguese Russian Spanish Welsh expandable to any language
  • Developer: http://www.talp.upc.edu/

FreeLing - Lemmatizer

  • Description:This module is somehow different of the other modules since it doesn't enrich the given text. It compares the given text with available models for different languages and returns the most likely language the text is written in. It can be used as a preprocess to determine which data files are to be used to analyze the text.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Lemmatizer
  • License: GPL
  • Language: Any
  • Developer: http://www.talp.upc.edu/

FreeLing - Morphological Analysis

  • Description: The morphological analyzer is a meta-module which does not perform any processing of its own.It is just a convenience module to simplify the instantiation and call. At instantiation time it receives a maco_options object containing ''information about which submodules have to be created and which files ''have to be used to create them. ''to the submodules described in the next sections (from [*] to [*]).At instantiation time it receives a maco_options object containing ''information about which submodules have to be created and which files ''have to be used to create them.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Morphological Analysis
  • License: GPL
  • Language: Asturian Catalan English Galician Italian Portuguese Russian Spanish Welsh
  • Developer: http://www.talp.upc.edu/

FreeLing - NER

  • Description:There are two different modules able to perform NE recognition. They can be instantiated directly or via a wrapper that will create the right module depending on the configuration file.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: NER
  • License: GPL
  • Language: Asturian Catalan English Galician Italian Portuguese Russian Spanish Welsh
  • Developer: http://www.talp.upc.edu/

FreeLing - POS Tagger

  • Description:There are two different modules able to perform PoS tagging. The application should decide which method is to be used and instantiate the right class.The first PoS tagger is the hmm_tagger class which is a classical trigam Markovian tagger following [#!brants00!#].The second module named relax_tagger is a hybrid system'' capable to integrate statistical and hand-coded knowledge following [#!padro98a!#].
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: POS Tagger
  • License: GPL
  • Language: Asturian Catalan English Galician Italian Portuguese Russian Spanish Welsh
  • Developer: http://www.talp.upc.edu/

FreeLing - Parser

  • Description:The dependency parser works in three stages:At the first stage the'' rules are used to complete the shallow parsing'' produced by the chart into a complete parsing tree. The rules are'' applied to a pair of adjacent chunks. At each step the selected'' pair is fused in a single chunk. The process stops when only one chunk remains. The next step is an automatic conversion of the complete parse tree to'' a dependency tree. Since the parsing grammar encodes information about the head of each rule the conversion is straighforward. The last step is the labeling. Each edge in the dependeny tree is labeled with a syntactic function using the rules
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Parser
  • License: GPL
  • Language: Asturian Catalan English Galician Italian Portuguese Russian Spanish Welsh
  • Developer: http://www.talp.upc.edu/

Freeling

  • Description:Tokenization rules are regular expressions that are matched against the beggining of the text line being processed The first matching rule is used to extract the token the matching substring is deleted from the line and the process is repeated until the line is empty
  • Group: text processing
  • Type: NLP Tools
  • Subtype: Tokenizer
  • License:
  • Language: 0
  • Developer: http://www.talp.upc.edu/

Freeling - NLP toolset and resources

  • Description:FreeLing is a library providing language analysis services oriented to satisfy the needs of Natural Language Processing. FreeLing is designed to be used as an external library from any application requiring this kind of services. Nevertheless a simple main program is also provided as a basic interface to the library which enables the user to analyze text files from the command line. Actually many users do not develop on FreeLing but use it as a text processing tool.
  • Group: Text processing
  • Type: NLP Tools
  • Subtype: NLP toolset and resources
  • License: GPL
  • Language: Any
  • Developer: http://www.talp.upc.edu/

Freeling - Tokenizer

  • Description:Tokenization rules are regular expressions that are matched against the beggining of the text line being processed. The first matching rule is used to extract the token the matching substring is deleted from the line and the process is repeated until the line is empty.
  • Group: Text processing
  • Type: NLP Tools
  • Subtype: Tokenizer
  • License: GPL
  • Language: -
  • Developer: http://www.talp.upc.edu/


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: