Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

283 results

Tools

Impact Bulgarian Historical Lexicon

  • Description:The current lexicon consists of 28857 lexical entries developed in LeXtractor. The size of the historical lexicon extracted from the manually validated corpus tokens is given the following: 26148 word forms 25861 normalised 21115 modernized and 11090 lemmata.''The lexicon is currently available as LeXtractor and TEI P5 XML and in the IMPACT database structure.
  • Group: Data
  • Type: Language resources
  • Subtype: Historical lexicon
  • License: CC-BY-NC-SA
  • Language: Bulgarian
  • Developer: Bulgarian Academy of Sciences

Impact Bulgarian Institutional Dataset

  • Description:The image collection for Bulgarian language is provided by National Library of Bulgaria. The dataset consists of 4.240 images in high resolution
  • Group: Data
  • Type: Images
  • Subtype: -
  • License: CC-BY-NC-ND
  • Language: Bulgarian
  • Developer: National Library of Bulgaria

Impact Czech Demonstrator Dataset

  • Description:The Czech ground truth produced by Národní knihovna České republiky (National Library of Czech Republic - NKC) in the frame of the EU funded Impact project consists of 5.049 pages in PAGE XML format with an accuracy of 99.95%
  • Group: Data
  • Type: Groundtruth
  • Subtype: -
  • License: CC-BY-NC-SA
  • Language: Czech
  • Developer: National Library of Czech Republic

Impact Czech Historical Lexicon

  • Description:The period covered by the Historical Lexicon of Czech is between 1800 and 1900.The current lexicon is divided in different periods:''- 1801-1809: 16052 lemmata 311362 word forms and 321099 lemma/word forms combinations.''- 1810-1842: 16056 lemmata 297122 word forms and 304711 lemma/word forms combinations.''- 1843-1849: 9406 lemmata 178783 word forms and 183079 lemma/word forms combinations.''- 1850+: 31954 lemmata 506663 word forms and 518628lemma/word forms combinations.
  • Group: Data
  • Type: Language resources
  • Subtype: Historical lexicon
  • License: CC-BY-NC-ND
  • Language: Czech
  • Developer: Charles University Prague

Impact Czech Institutional Dataset

  • Description:The image collection for Czech language is provided by National Library of Czech Republic. The dataset consists of 75.559 images in high resolution
  • Group: Data
  • Type: Images
  • Subtype: -
  • License: CC-BY-NC-SA
  • Language: Czech
  • Developer: National Library of Czech Republic

Impact Ducth Named Entities Lexica

  • Description:The Core Named Entities Lexicon for Dutch is an elaborate database of enriched historical Dutch locations person names and organisations from the period 1750 - 1945. It can be used as a lexicon for OCR and for query expansion in retrieval.
  • Group: Data
  • Type: Language resources
  • Subtype: Named entities lexica
  • License: pending
  • Language: Dutch
  • Developer: Instituut voor Nederlandse Lexicologie

Impact Dutch Demonstrator Dataset

  • Description:he Dutch ground truth produced by Koninklijke Bibliotheek (KB) in the frame of the EU funded Impact project consists of 3.439 pages in PAGE XML format with an accuracy of 99.95%
  • Group: Data
  • Type: Groundtruth
  • Subtype: -
  • License: No restriction (except newspapers: no usage allowed)
  • Language: Dutch
  • Developer: National Library of the Netherlands

Impact Dutch Historical Lexicon

  • Description:The period covered by the Historical Lexicon of Dutch is since 1600 until 1940 and the type of material used is books newspapers and parliamentary papers.''''The Dutch IR lexicon has been built by means of the IMPACT dictionary attestation tool from the quotations of the WNT (Dictionary of the Dutch language). The lexicon currently contains 475498 distinct word forms 215180 lemmata and 558438 distinct lemma/word form combinations with 1636709 attestations.
  • Group: Data
  • Type: Language resources
  • Subtype: Historical lexicon
  • License: pending
  • Language: Dutch
  • Developer: Instituut voor Nederlandse Lexicologie

Impact Dutch Institutional Dataset

  • Description:The image collection for Dutch language is provided by National Library of the Netherlands. The dataset consists of 88.192 images in high resolution
  • Group: Data
  • Type: Images
  • Subtype: -
  • License: No restriction (except newspapers: no usage allowed)
  • Language: Dutch
  • Developer: National Library of the Netherlands

Impact English Demonstrator Dataset

  • Description:The English ground truth produced by the british Library (BL) in the frame of the EU funded Impact project consists of 2.775 pages in PAGE XML format with an accuracy of 99.95%
  • Group: Data
  • Type: Groundtruth
  • Subtype: -
  • License: pending
  • Language: English
  • Developer: The British Library

Impact English Historical Lexicon

  • Description:The period covered by the Historical Lexicon of English is since 1497 until 1900. The type of material used consists of books newspapers and papers.''''The English IR lexicon has been built by means of the IMPACT dictionary attestation tool from the quotations of the OED (Oxford English Dictionary). The lexicon currently contains 874311 lemma/word form combinations.
  • Group: Data
  • Type: Language resources
  • Subtype: Historical lexicon
  • License: pending
  • Language: English
  • Developer: Instituut voor Nederlandse Lexicologie

Impact English Institutional Dataset

  • Description:The image collection for English language is provided by the British Library. The dataset consists of 48.515 images in high resolution
  • Group: Data
  • Type: Images
  • Subtype: -
  • License: pending
  • Language: English
  • Developer: The British Library


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: