Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

283 results

Tools

Impact Polish Institutional Dataset

  • Description:The image collection for Polish language is provided by Poznań Supercomputing and Networking Center. The dataset consists of 11.020 images in high resolution
  • Group: Data
  • Type: Images
  • Subtype: -
  • License: CC-BY
  • Language: Polish
  • Developer: Poznań Supercomputing and Networking Center

Impact Slovene Demonstrator Dataset

  • Description:The Slovene ground truth produced by the National and University Library of Slovenia (NUK) in the frame of the EU funded Impact project consists of 4.937 pages in PAGE XML format with an accuracy of 99.95%
  • Group: Data
  • Type: Groundtruth
  • Subtype: -
  • License: CC-BY-NC-SA
  • Language: Slovene
  • Developer: National and University Library of Slovenia

Impact Slovene Historical Lexicon

  • Description:Apart from about 40 pages from a sixteenth-century and a seventeenth-century book the dataset for historical Slovene contains material published from the second half of the eighteenth century to the end of the nineteenth century. The material consists of books and one daily newspaper.''''The current lexicon consists of the initial 3000 lexical entries developed in LeXtractor and the lexicon that can be automatically extracted from the manually validated tokens from the reference corpus. At the time of writing the size of lexica extracted from the manually validated corpus tokens was as follows: 16245 lexical entries 15715 word forms 14249 normalized 11396 modernized and 6789 lemmata.
  • Group: Data
  • Type: Language resources
  • Subtype: Historical lexicon
  • License: CC-BY
  • Language: Slovene
  • Developer: Jozef Stefan Institute

Impact Slovene Institutional Dataset

  • Description:The image collection for Slovene language is provided by National and University Library of Slovenia. The dataset consists of 41.313 images in high resolution
  • Group: Data
  • Type: Images
  • Subtype: -
  • License: CC-BY-NC-SA
  • Language: Slovene
  • Developer: National and University Library of Slovenia

Impact Spanish Demonstrator Dataset

  • Description:The Spanish ground truth produced by Universidad de Alicante (UA) in the frame of the EU funded Impact project consists of 11.444 pages in PAGE XML format with an accuracy of 99.95%
  • Group: Data
  • Type: Groundtruth
  • Subtype: -
  • License: CC-BY-NC-SA
  • Language: Spanish
  • Developer: National Library of Spain

Impact Spanish Historical Lexicon

  • Description:Fourteen works of Spanish Literature and a dictionary (consisting of 6 volumes) were selected for the IMPACT Demonstrator dataset. Most books are from the sixteenth or seventeenth century known as the Spanish Golden Age. They are mostly literary works: religious plays novels poetry... Just one book belongs to eighteenth century as does the Diccionario de Autoridades. Two of these books are from America: Cartha Athenagorica by Sor Juana Inés de la Cruz and Commentarios reales by Inca Garcilaso de la Vega they were selected in order to register the vocabulary of Spanish in Latin America.''''Apart from these books a selection of 86 works between late 15th Century and 17th Century were selected from Biblioteca Virtual Miguel de Cervantes consisting of almost 2 million tokens and 90.000 word forms.''''The current lexicon consists of 11846 lemmata 31584 word forms and 36857 lemma/word forms combinations.
  • Group: Data
  • Type: Language resources
  • Subtype: Historical lexicon
  • License: CC-BY-NC-SA
  • Language: Spanish
  • Developer: University of Alicante

Impact Spanish Institutional Dataset

  • Description:The image collection for Spanish language is provided by Biblioteca Nacional de España. The dataset consists of 60.180 images in high resolution
  • Group: Data
  • Type: Images
  • Subtype: -
  • License: CC-BY-NC-ND
  • Language: Spanish
  • Developer: National Library of Spain

Impact Tools

  • Description:The spelling of words in historical texts can differ widely from modern spelling There are two general approaches to match different spellings First it is possible to use rewrite rules that transform words in one spelling to another For historical dictionary which covers a large timespan and in which variation is not limited to orthography this approach is not satisfactory Therefore the use of statistics is often needed
  • Group: text processing
  • Type: NLP Tools
  • Subtype: Spelling variations
  • License:
  • Language: 0
  • Developer: http://www.inl.nl/home

Impact Tools - Lemmatization

  • Description:IMPACT provides tools for: 1. Reducing historical word forms to one or several possible modern lemma's (lemmatization) 2. Expanding lemma lists with part of speech information to possible ("hypothetical") full forms.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Lemmatization
  • License: ASL 2.0
  • Language: -
  • Developer: http://www.inl.nl/home

Impact Tools - Spelling variations

  • Description:The spelling of words in historical texts can differ widely from modern spelling. There are two general approaches to match different spellings. First it is possible to use rewrite rules that transform words in one spelling to another. For historical dictionary which covers a large timespan and in which variation is not limited to orthography this approach is not satisfactory. Therefore the use of statistics is often needed.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Spelling variations
  • License: ASL 2.0
  • Language: -
  • Developer: http://www.inl.nl/home

Inventory Extraction

  • Description:Allows for the extraction of a complete list of characters from a document without reference to a specific language dictionary or a library of fonts.
  • Group: Text Recognition
  • Type: -
  • Subtype:
  • License: ASL 2.0
  • Language: Not applicable
  • Developer: University of Innsbruck

Islandora

  • Description:Javascript based TEI Transcription Editor
  • Group: Miscellaneous Utilities
  • Type: -
  • Subtype: Transcription
  • License: unknown
  • Language: -
  • Developer: Nigel Banks


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: