Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

286 results

Tools

Inventory Extraction

  • Description:Allows for the extraction of a complete list of characters from a document without reference to a specific language dictionary or a library of fonts.
  • Group: Text Recognition
  • Type: -
  • Subtype:
  • License: ASL 2.0
  • Language: Not applicable
  • Developer: University of Innsbruck

Islandora

  • Description:Javascript based TEI Transcription Editor
  • Group: Miscellaneous Utilities
  • Type: -
  • Subtype: Transcription
  • License: unknown
  • Language: -
  • Developer: Nigel Banks

JGAAP

  • Description:authorship attibution software
  • Group: Text Processing
  • Type: -
  • Subtype: authorship attribution
  • License: GPL?
  • Language: -
  • Developer: Evaluating Variation in Language Laboratory

JHOVE2

  • Description:The JHOVE2 project generalizes the concept of format characterization to include identification, validation, feature extraction, and policy-based assessment.
  • Group: text processing
  • Type: language resources
  • Subtype: discovery interface
  • License:
  • Language: n/a
  • Developer:
  • Wiki

JavaOCR

  • Description:This OCR engine is implemented as a Java library along with a demo application which shows the library in action. The core concept at the character level is image matching with automatic position and aspect ratio correction using a least-square-error matching algorithm. It is a very simple yet reasonably effective implementation.
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: BSD
  • Language: -
  • Developer: -

Jmet2ont

  • Description:A tool that makes it possible to transform metadata from a traditional XMLbased schema to RDFOWLMappings are described with XML Existing mappings used in SYNAT transform traditional librarymuseum formats to the CIDOC CRMFRBRoo ontology
  • Group: metadata processing
  • Type: -
  • Subtype: Format transformation (XML)
  • License:
  • Language: null
  • Developer: Poznań Supercomputing and Networking Center

Kognition

  • Description:An omnifont OCR software for KDE. Due to the fact that each step of the OCR process can be visualized you can get a quick idea of how OCR works and where the problems lie. However the program may be of minor/no use for end users in its current state.
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: GPLv2
  • Language: -
  • Developer: -

LX-Parser

  • Description:LX-Parser is a statistical constituency parser for Portuguese. It performs a syntactic analysis of Portuguese sentences in terms of their constituency structure.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Parser
  • License: Free
  • Language: Portuguese
  • Developer: http://lxcenter.di.fc.ul.pt/home/en/index.html

LX-Tagger

  • Description:Lx-Tagger is a part-of-speech tagger for Portuguese that assigns a single morpho-syntactic tag from the tagset below to every token
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: POS tagger
  • License: Proprietary
  • Language: 1
  • Developer: http://lxcenter.di.fc.ul.pt/home/en/index.html

LemmaGen

  • Description:LemmaGen project aims at providing standardized open source multilingual platform for lemmatisation. We started this work as a result of lack of high quality lemmatiser for Slovene language. Currently we have not only the lemmatiser for Slovene but also for 11 other European languages and the system which is able to learn lemmatisation rules for new languages by providing it with existing wordform-lemma pair examples.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Stemmer/Lemmatizer
  • License: free open source
  • Language: Slovene11 more
  • Developer: matjaz.jursic@ijs.si

Leptonica

  • Description:Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.
  • Group: Image processing
  • Type: Image Processing and Enhancement
  • Subtype: toolbox
  • License: Own license (similar to ASL)
  • Language: -
  • Developer: Dan Bloomberg

Lextek

  • Description:For many applications it is important to be able to correctly identify the language that a document or piece of text is written in. The Lextek Language Identifier enables you to do this. Since some languages may be written in several character encodings the Lextek Language Identifier will automatically identify what character encoding the text was written in. Supporting approximately 260 different languages and character encodings the Lextek Language Identifier gives you the ability to automatically recognize more languages and encodings than any other language identifier available. We are adding more languages all the time and work closely with our customers to ensure that their language recognition needs are fully supported.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Language Identification
  • License: commercial
  • Language: 260
  • Developer: http://www.lextek.com/


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: