Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

287 results

Tools

Impact Tools - Spelling variations

  • Description:The spelling of words in historical texts can differ widely from modern spelling. There are two general approaches to match different spellings. First it is possible to use rewrite rules that transform words in one spelling to another. For historical dictionary which covers a large timespan and in which variation is not limited to orthography this approach is not satisfactory. Therefore the use of statistics is often needed.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Spelling variations
  • License: ASL 2.0
  • Language: -
  • Developer: http://www.inl.nl/home

Inventory Extraction

  • Description:Allows for the extraction of a complete list of characters from a document without reference to a specific language dictionary or a library of fonts.
  • Group: Text Recognition
  • Type: -
  • Subtype:
  • License: ASL 2.0
  • Language: Not applicable
  • Developer: University of Innsbruck

Islandora

  • Description:Javascript based TEI Transcription Editor
  • Group: Miscellaneous Utilities
  • Type: -
  • Subtype: Transcription
  • License: unknown
  • Language: -
  • Developer: Nigel Banks

JGAAP

  • Description:authorship attibution software
  • Group: Text Processing
  • Type: -
  • Subtype: authorship attribution
  • License: GPL?
  • Language: -
  • Developer: Evaluating Variation in Language Laboratory

JHOVE2

  • Description:The JHOVE2 project generalizes the concept of format characterization to include identification, validation, feature extraction, and policy-based assessment.
  • Group: text processing
  • Type: language resources
  • Subtype: discovery interface
  • License:
  • Language: n/a
  • Developer:
  • Wiki

JavaOCR

  • Description:This OCR engine is implemented as a Java library along with a demo application which shows the library in action. The core concept at the character level is image matching with automatic position and aspect ratio correction using a least-square-error matching algorithm. It is a very simple yet reasonably effective implementation.
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: BSD
  • Language: -
  • Developer: -

Jmet2ont

  • Description:A tool that makes it possible to transform metadata from a traditional XMLbased schema to RDFOWLMappings are described with XML Existing mappings used in SYNAT transform traditional librarymuseum formats to the CIDOC CRMFRBRoo ontology
  • Group: metadata processing
  • Type: -
  • Subtype: Format transformation (XML)
  • License:
  • Language: null
  • Developer: Poznań Supercomputing and Networking Center

Kognition

  • Description:An omnifont OCR software for KDE. Due to the fact that each step of the OCR process can be visualized you can get a quick idea of how OCR works and where the problems lie. However the program may be of minor/no use for end users in its current state.
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: GPLv2
  • Language: -
  • Developer: -

LX-Parser

  • Description:LX-Parser is a statistical constituency parser for Portuguese. It performs a syntactic analysis of Portuguese sentences in terms of their constituency structure.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Parser
  • License: Free
  • Language: Portuguese
  • Developer: http://lxcenter.di.fc.ul.pt/home/en/index.html

LX-Tagger

  • Description:Lx-Tagger is a part-of-speech tagger for Portuguese that assigns a single morpho-syntactic tag from the tagset below to every token
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: POS tagger
  • License: Proprietary
  • Language: 1
  • Developer: http://lxcenter.di.fc.ul.pt/home/en/index.html

LemmaGen

  • Description:LemmaGen project aims at providing standardized open source multilingual platform for lemmatisation. We started this work as a result of lack of high quality lemmatiser for Slovene language. Currently we have not only the lemmatiser for Slovene but also for 11 other European languages and the system which is able to learn lemmatisation rules for new languages by providing it with existing wordform-lemma pair examples.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Stemmer/Lemmatizer
  • License: free open source
  • Language: Slovene11 more
  • Developer: matjaz.jursic@ijs.si

Leptonica

  • Description:Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.
  • Group: Image processing
  • Type: Image Processing and Enhancement
  • Subtype: toolbox
  • License: Own license (similar to ASL)
  • Language: -
  • Developer: Dan Bloomberg


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: