Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

283 results

Tools

Hot Metal Font Enhancer

  • Description:font enhancement of prints produced hot metal typesetting allowing higher OCR accuracy
  • Group: Image Processing
  • Type: Image Processing and Enhancement
  • Subtype: -
  • License: commercial
  • Language: -
  • Developer: Fraunhofer IAIS

IBM Adaptive OCR Engine

  • Description:IBM Adaptive OCR is a comprehensive software system which improves the recognition of historical texts significantly by applying adaptivity as one of the main features to the text recognition process. It integrates several other tools such as the image enhancement toolkit the ABBYY FineReader Engine the post correction tool and the lexical resources developed during the IMPACT project.
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: commercial
  • Language: English Dutch German
  • Developer: IBM Israel - Science and Technology Ltd

IMPACT Digitisation Cost Estimator

  • Description:This tool will estimate the overall cost of undertaking a digitisation project.
  • Group: Miscellaneous Utilities
  • Type: -
  • Subtype:
  • License: unknown
  • Language: -
  • Developer: Gottingen State and University Library

IMPACT Digitisation Storage Estimator

  • Description:This tool will estimate the overall storage in a digitisation project.
  • Group: Miscellaneous Utilities
  • Type: -
  • Subtype:
  • License: unknown
  • Language: -
  • Developer: Münchener Digitalisierungszentrum

INL Lemmatizer

  • Description:INL-developed tagger-lemmatizer for historical Dutch, where the tagger is trained on the “Letters as loot” corpus and the lemmatizer is based on the INL historical lexicon.
  • Group: text processing
  • Type: language resources
  • Subtype: 0
  • License:
  • Language: dutch
  • Developer:

INL word evaluation

  • Description:Perform word evaluation of OCR by comparing the results in PAGE format with ground truth.
  • Group: text processing
  • Type: language resources
  • Subtype: evaluation
  • License:
  • Language: n/a
  • Developer:

IOBBER (chunker)

  • Description:IOBBER is a chunker for Polish. Its job is to recognise syntactic''phrases (chunks) in Polish text. The name comes from IOB tags that are''assigned to tokens to represent chunks (strictly speaking we use IOB2''representation). Here is an example sentence annotated with NP and VP''chunks:''* [Dziennikarka]NP [zarzucała]VP [Rutkowskiemu]NP [to]NP że [całe jego''działanie ws. zaginięcia]NP [to]VP [„show”]NP''IOBBER is a reimplementation of CRF++ chunker available in Disaster.
  • Group: Text Processing
  • Type: -
  • Subtype: chunker
  • License: unknown
  • Language: Polish
  • Developer: The WrocUT Language Technology Group G4.19

ISRI Tools

  • Description:Images and Ground Truth text and zone files for several thousand English and some Spanish pages that were used in the UNLV/ISRI annual tests of OCR accuracy between 1992 and 1996. Source code of OCR evaluation tools used in the UNLV/ISRI annual tests of OCR Accuracy.
  • Group: Evaluation
  • Type: OCR (text)
  • Subtype: evaluation
  • License: ASL 2.0
  • Language: -
  • Developer: -

Iconv

  • Description:Perform conversion of character encoding using Iconv.
  • Group: text processing
  • Type: language resources
  • Subtype: format transformation
  • License:
  • Language: n/a
  • Developer:

ImageMagick GraphicsMagick

  • Description:ImageMagick is a software suite to create edit compose or convert bitmap images GraphicsMagick is the swiss army knife of image processing It has been derived from ImageMagick 552
  • Group: image processing
  • Type: Image Processing and Enhancement
  • Subtype: -
  • License:
  • Language: 0
  • Developer: ImageMagick Studio / GraphicsMagick Group

ImpacTok Tokenizer

  • Description:The tokenizer is used to pre-process documents that form the corpus used to build the lexicon. Tokenization is the process of breaking down a stream of text into words or tokens. This tokenizer is based on ILKTOK, part of the ‘Tadpole’ language processing suite (ilk.uvt.nl/software/). A rewrite of the code was necessary in order to produce the output required for the database used for the IMPACT Lexicon and to introduce a more modular approach.
  • Group: text processing
  • Type: nlp tools
  • Subtype: tokenizer
  • License:
  • Language: n/a
  • Developer: IVdNT

Impact Bulgarian Demonstrator Dataset

  • Description:The Bulgarian ground truth produced by National Library of Bulgaria (NLB) in the frame of the EU funded Impact project consists of 1.276 pages in PAGE XML format with an accuracy of 99.95%
  • Group: Data
  • Type: Groundtruth
  • Subtype: -
  • License: CC-BY-NC-ND
  • Language: Bulgarian
  • Developer: National Library of Bulgaria


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: