Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

283 results

Tools

NewOCR

  • Description:NewOCR.com is a free online OCR service based on Tesseract. It can analyze the text in any image file that you upload and then convert the text from the image into text that you can easily edit on your computer
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: Own license
  • Language: Same as Tesseract 3 see also website
  • Developer: -

O2

  • Description:Library with methods developed for document analysis and recognition
  • Group: Layout Analysis
  • Type: -
  • Subtype: Framework
  • License: Own license
  • Language: -
  • Developer: -

OCR gem

  • Description:Recognize text and characters from image files using web services.
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: MIT
  • Language: brazilian byelorussian bulgarian catalan croatian czech danish dutch english estonian finnish french german greek hungarian indonesian italian latin latvian lithuanian moldavian polish portuguese romanian russian serbian slovakian slovenian spanish swedish turkish ukrainian
  • Developer: -

OCRFeeder

  • Description:OCRFeeder is a document layout analysis and optical character recognition system
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: GPL
  • Language: -
  • Developer: The GNOME Project

OCRchie

  • Description:The original OCR package could learn from a tif file and ascii translation then recognize a document in the same font. This semester we added interactive learning interactive segmentation of mathematics page zoning (the ability to automatically or manually zone columns or regions of text and interactive read-order specification.
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: unknown
  • Language: -
  • Developer: -

OCRopodium

  • Description:As part of the Ocropodium project at KCL's Centre for e-Research we're investigating OCR workflows for digitising historical collections. In the course of experimenting with Ocropus Tesseract and other software we've developed some tools and utilities that might be of interest to others. Currently there's a Django web application for performing batch OCR a Qt GUI for correcting ground-truth transcripts from Ocropus bookstores and a viewer for previewing its page segmentation results:
  • Group: Text Processing
  • Type: Core Text Recognition
  • Subtype: Framework
  • License: ASL 2.0
  • Language: -
  • Developer: Releases:

Olena

  • Description:A platform dedicated to image processing and pattern recognition. Its core component is a generic and efficient C++ library called Milena. Milena provides a framework to implement simple fast safe reusable and extensible image processing tool chains.
  • Group: Layout Analysis
  • Type: -
  • Subtype:
  • License: GPLv2
  • Language: -
  • Developer: EPITA Research & Development Laboratory (LRDE)

OmniPage

  • Description:State-of-the-art OCR engine
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: commercial
  • Language: 123 languages
  • Developer: Nuance

OpenJPEG Conversion

  • Description:Perform conversion from JPEG2000 to TIFF image file format or vice versa. Implementation is based on the OpenJPEG library.
  • Group: image processing
  • Type: image processing and enhancement
  • Subtype: image enhancement
  • License:
  • Language: n/a
  • Developer:

OxGarage

  • Description:OxGarage is an web and RESTful service to manage the transformation of documents between a variety of formats. The majority of transformations use the Text Encoding Initiative format as a pivot format
  • Group: Metadata Processing
  • Type: -
  • Subtype: Format transformation
  • License: unknown
  • Language: -
  • Developer: Sebastian Rahtz (Oxford University Computing Services)

PAGE XML exporter

  • Description:Exports FR10 format to PAGE XML
  • Group: text processing
  • Type: ocr (text)
  • Subtype: format transformation (xml)
  • License:
  • Language: n/a
  • Developer:

Pandoc

  • Description:conversion engine
  • Group: Metadata Processing
  • Type: -
  • Subtype: Format transformation (XML)
  • License: GNU GPL
  • Language: -
  • Developer: John MacFarlane


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: