Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

283 results

Tools

Pantera

  • Description:The PANTERA is a Brill Tagger for morphologically rich languages eg. Polish.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: POS Tagger
  • License: GPL
  • Language: Polish
  • Developer: http://zil.ipipan.waw.pl/

Paradiit

  • Description:The PaRADIIT (Pattern Redundancy Analysis for Document Image Indexing and Transcription) project is a research project conducted by the RFAI Team of the Computer Science Laboratory of Tours. The project focused on layout analysis text/graphics separation Optical Character Recognition (OCR) and text transcription processes dedicated to old books and historical documents. Additions: This is very much like the IBM concert tool also has ideas related to the inventory extraction! It consists of two processing steps: AGORA which extracts clusters of characters and RETRO which presents something like IBM's carpets.
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: Framework
  • License: GPL
  • Language: -
  • Developer: -

Photoscore

  • Description:Music OCR: music scanning & PDF to notation
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: commercial
  • Language: -
  • Developer: Neuratron

Plasma OCR

  • Description:An omnifont OCR engine. The long-term goal is recognition of formulas.
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: GPL
  • Language: -
  • Developer: -

Polyglot 3000

  • Description:Polyglot 3000 is an automatic language identifier that quickly recognizes the language of any text phrase or even single words. It is available for Windows 95/98/NT/ME/2000/XP/2003/Vista/2008/7/8.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Language Identification
  • License: unknown
  • Language: More than 400
  • Developer: http://www.polyglot3000.com/developers.shtml

PrimeOCR

  • Description:Prime Recognition's production OCR product PrimeOCR is a Windows OCR engine that claims to reduce OCR error rates by up to 65-80% over conventional OCR by implementing "Voting" OCR technology.
  • Group: Text recognition
  • Type: Core Text Recognition
  • Subtype: OCR
  • License: Commercial
  • Language: Danish English German Norwegian Spanish Dutch French Italian Portuguese Swedish
  • Developer: PrimeRecognition

Proofread page

  • Description:Proofread Page is an extension for MediaWiki which allows you to edit transcriptions side by side with the page images. It is used on WikiSource for manuscript and early print transcription projects. Proofread Page supports workflow but no markup.
  • Group: Text Recognition
  • Type: Postcorrection
  • Subtype: -
  • License: GPL v2
  • Language: -
  • Developer: ThomasV (original author)''Tpt (current maintainer)

ReadIris

  • Description:Readiris is a OCR solution designed for private users and small to large office users
  • Group: Text recognition
  • Type: Core Text Recognition
  • Subtype: OCR
  • License: Commercial
  • Language: 140 languages
  • Developer: IRIS

Rescribe OCR

  • Description:Rescribe\\\'s open source Latin OCR software is based on Google\\\'s Tesseract and has been developed particularly for text recognition of historic Latin printed texts. Detailed instructions and additional helpful open source tools for Windows, Linux and OSX can be found on latinocr.org
  • Group: text recognition
  • Type: ocr (text)
  • Subtype:
  • License:
  • Language: latin
  • Developer: nick white, antonia karaisl

Rescribe OCR

  • Description:Rescribe\'s open source Latin OCR software is based on Google\'s Tesseract and has been developed particularly for text recognition of historic Latin printed texts. Detailed instructions and additional helpful open source tools for Windows, Linux and OSX can be found on latinocr.org
  • Group: text recognition
  • Type: ocr (text)
  • Subtype:
  • License:
  • Language: latin
  • Developer: nick white, antonia karaisl

Rosette

  • Description:Automatically Detects the Language of Any Digital Text. Rosette® Language Identifier analyzes text identifying the language and the character encoding scheme. Detecting the language of documents is a critical first step in any process that handles multilingual text. Our software recognizes 55 languages and 45 encodings and processes files extremely quickly and accurately.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Language Identification
  • License: commercial
  • Language: 55
  • Developer: http://www.basistech.com/

Rosette Base Linguistics

  • Description:Sophisticated morphological analysis segmentation and tagging of Arabic Asian and European language text
  • Group: Text processing
  • Type: NLP Tools
  • Subtype: Tokenizer
  • License: Commercial
  • Language: 40
  • Developer: http://www.basistech.com/


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: