Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

283 results

Tools

Impact English Named Entities Lexica

  • Description:The Core Named Entities Lexicon for English is an elaborate database of enriched historical English locations person names and organisations from the period 1742 - 1899. It can be used as a lexicon for OCR and for query expansion in retrieval.
  • Group: Data
  • Type: Language resources
  • Subtype: Named entities lexica
  • License: pending
  • Language: English
  • Developer: Instituut voor Nederlandse Lexicologie

Impact French Demonstrator Dataset

  • Description:The French ground truth produced by Bibliothèque nationale de France (BnF) in the frame of the EU funded Impact project consists of 7.980 pages in PAGE XML format with an accuracy of 99.95%
  • Group: Data
  • Type: Groundtruth
  • Subtype: -
  • License: pending
  • Language: French
  • Developer: National Library of France

Impact French Historical Lexicon

  • Description:The Historical Lexicon of French is focused on the 17th century late Renaissance French. It is an intermediate period between Middle French (covered by LGeRM data - see http://www.atilf.fr/dmf/LGeRM/) and Modern French (covered by Morphalou data - see http://www.cnrtl.fr/lexiques/morphalou/).''''The current lexicon consists of 27508 lemmata 115201 word forms and 141152 lemma/word forms combinations.
  • Group: Data
  • Type: Language resources
  • Subtype: Historical lexicon
  • License: CC-BY-SA
  • Language: French
  • Developer: ATILF

Impact French Institutional Dataset

  • Description:The image collection for French language is provided by Bibliothèque nationale de France. The dataset consists of 96.950 images in high resolution
  • Group: Data
  • Type: Images
  • Subtype: -
  • License: pending
  • Language: French
  • Developer: National Library of France

Impact German Demonstrator Dataset (BSB)

  • Description:The German ground truth produced by Bayerische Staatsbibliothek (BSB) in the frame of the EU funded Impact project consists of 3.054 pages in PAGE XML format with an accuracy of 99.95%
  • Group: Data
  • Type: Groundtruth
  • Subtype: -
  • License: pending
  • Language: German
  • Developer: Bavarian State Library

Impact German Demonstrator Dataset (ONB)

  • Description:The German ground truth produced by Österreichsche Nationalbibliothek (ONB) in the frame of the EU funded Impact project consists of 887 pages in PAGE XML format with an accuracy of 99.95%
  • Group: Data
  • Type: Groundtruth
  • Subtype: -
  • License: pending
  • Language: German
  • Developer: National Library of Austria

Impact German Historical Lexicon

  • Description:The German historical corpus consists of 510 texts varying in length and including different genres. It contains 3552690 tokens (words in running text) and 369730 types (unique words) in total. As the texts originate from 1350-1950 the German corpus contains material both from the Early New High German period (1350-1650) and the New High German period (since 1650) covering all subperiods as well.
  • Group: Data
  • Type: Language resources
  • Subtype: Historical lexicon
  • License: CC-BY- NC-SA
  • Language: German
  • Developer: University of Munich

Impact German Institutional Dataset (BSB)

  • Description:The image collection for German language is provided by Bayerische Staatsbibliothek. The dataset consists of 66.627 images in high resolution
  • Group: Data
  • Type: Images
  • Subtype: -
  • License: pending
  • Language: German
  • Developer: Bavarian State Library

Impact German Institutional Dataset (ONB)

  • Description:The image collection for German language is provided by Österreichsche Nationalbibliotheek. The dataset consists of 110.034 images in high resolution
  • Group: Data
  • Type: Images
  • Subtype: -
  • License: pending
  • Language: German
  • Developer: National Library of Austria

Impact German Named Entities Lexica

  • Description:The Core Named Entities Lexicon for German is a set of named entities (historical German locations person names and organisations) which are likely to appear in a wide variety of texts with extensions specific to text types targeted by IMPACT according to scope information provided by the ONB (Austrian National Library) and BSB (Bavarian State Library). It can be used as a lexicon for OCR and for query expansion in retrieval.
  • Group: Data
  • Type: Language resources
  • Subtype: Named entities lexica
  • License: pending
  • Language: German
  • Developer: University of Munich

Impact Polish Demonstrator Dataset

  • Description:The Polish ground truth produced by Poznań Supercomputing and Networking Center (PSNC) in the frame of the EU funded Impact project consists of 4.693 pages in PAGE XML format with an accuracy of 99.95%
  • Group: Data
  • Type: Groundtruth
  • Subtype: -
  • License: CC-BY
  • Language: Polish
  • Developer: Poznań Supercomputing and Networking Center

Impact Polish Historical Lexicon

  • Description:The primary resource was the Internet dictionary we shall refer to as the "Late Middle Polish dictionary" its official name being "The dictionary of the Polish language of the sixteenth and the first half of the seventeenth century".''''The current lexicon consists of 9909 lemmata 24977 word forms and 26736 lemma/word forms combinations.''''Also a set of more than 100 rules for historical spelling of Polish developed for the IMPACT project are now available
  • Group: Data
  • Type: Language resources
  • Subtype: Historical lexicon
  • License: pending
  • Language: Polish
  • Developer: University of Warsaw


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: