Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

286 results

Tools

Impact English Historical Lexicon

  • Description:The period covered by the Historical Lexicon of English is since 1497 until 1900. The type of material used consists of books newspapers and papers.''''The English IR lexicon has been built by means of the IMPACT dictionary attestation tool from the quotations of the OED (Oxford English Dictionary). The lexicon currently contains 874311 lemma/word form combinations.
  • Group: Data
  • Type: Language resources
  • Subtype: Historical lexicon
  • License: pending
  • Language: English
  • Developer: Instituut voor Nederlandse Lexicologie

Impact English Named Entities Lexica

  • Description:The Core Named Entities Lexicon for English is an elaborate database of enriched historical English locations person names and organisations from the period 1742 - 1899. It can be used as a lexicon for OCR and for query expansion in retrieval.
  • Group: Data
  • Type: Language resources
  • Subtype: Named entities lexica
  • License: pending
  • Language: English
  • Developer: Instituut voor Nederlandse Lexicologie

Impact French Demonstrator Dataset

  • Description:The French ground truth produced by Bibliothèque nationale de France (BnF) in the frame of the EU funded Impact project consists of 7.980 pages in PAGE XML format with an accuracy of 99.95%
  • Group: Data
  • Type: Groundtruth
  • Subtype: -
  • License: pending
  • Language: French
  • Developer: National Library of France

Impact French Historical Lexicon

  • Description:The Historical Lexicon of French is focused on the 17th century late Renaissance French. It is an intermediate period between Middle French (covered by LGeRM data - see http://www.atilf.fr/dmf/LGeRM/) and Modern French (covered by Morphalou data - see http://www.cnrtl.fr/lexiques/morphalou/).''''The current lexicon consists of 27508 lemmata 115201 word forms and 141152 lemma/word forms combinations.
  • Group: Data
  • Type: Language resources
  • Subtype: Historical lexicon
  • License: CC-BY-SA
  • Language: French
  • Developer: ATILF

Impact French Institutional Dataset

  • Description:The image collection for French language is provided by Bibliothèque nationale de France. The dataset consists of 96.950 images in high resolution
  • Group: Data
  • Type: Images
  • Subtype: -
  • License: pending
  • Language: French
  • Developer: National Library of France

Impact German Demonstrator Dataset (BSB)

  • Description:The German ground truth produced by Bayerische Staatsbibliothek (BSB) in the frame of the EU funded Impact project consists of 3.054 pages in PAGE XML format with an accuracy of 99.95%
  • Group: Data
  • Type: Groundtruth
  • Subtype: -
  • License: pending
  • Language: German
  • Developer: Bavarian State Library

Impact German Demonstrator Dataset (ONB)

  • Description:The German ground truth produced by Österreichsche Nationalbibliothek (ONB) in the frame of the EU funded Impact project consists of 887 pages in PAGE XML format with an accuracy of 99.95%
  • Group: Data
  • Type: Groundtruth
  • Subtype: -
  • License: pending
  • Language: German
  • Developer: National Library of Austria

Impact German Historical Lexicon

  • Description:The German historical corpus consists of 510 texts varying in length and including different genres. It contains 3552690 tokens (words in running text) and 369730 types (unique words) in total. As the texts originate from 1350-1950 the German corpus contains material both from the Early New High German period (1350-1650) and the New High German period (since 1650) covering all subperiods as well.
  • Group: Data
  • Type: Language resources
  • Subtype: Historical lexicon
  • License: CC-BY- NC-SA
  • Language: German
  • Developer: University of Munich

Impact German Institutional Dataset (BSB)

  • Description:The image collection for German language is provided by Bayerische Staatsbibliothek. The dataset consists of 66.627 images in high resolution
  • Group: Data
  • Type: Images
  • Subtype: -
  • License: pending
  • Language: German
  • Developer: Bavarian State Library

Impact German Institutional Dataset (ONB)

  • Description:The image collection for German language is provided by Österreichsche Nationalbibliotheek. The dataset consists of 110.034 images in high resolution
  • Group: Data
  • Type: Images
  • Subtype: -
  • License: pending
  • Language: German
  • Developer: National Library of Austria

Impact German Named Entities Lexica

  • Description:The Core Named Entities Lexicon for German is a set of named entities (historical German locations person names and organisations) which are likely to appear in a wide variety of texts with extensions specific to text types targeted by IMPACT according to scope information provided by the ONB (Austrian National Library) and BSB (Bavarian State Library). It can be used as a lexicon for OCR and for query expansion in retrieval.
  • Group: Data
  • Type: Language resources
  • Subtype: Named entities lexica
  • License: pending
  • Language: German
  • Developer: University of Munich


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: