Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

289 results

Tools

Typereader

  • Description:TypeReader®has been in the global market and received hundreds of appraisals from various industry technology magazines since 1991. The heart of this award winning OCR software product ExperVision®’s OpenRTK® is the only OCR Engine which won UNLV Test for consecutive years. Commercial (server/desktop)
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: Own license
  • Language: -
  • Developer: -

Typewright

  • Description:TypeWright1 is a tool for correcting the text-version of a document made up of page images.
  • Group: Text Recognition
  • Type: Postcorrection
  • Subtype: -
  • License: ASL 2.0
  • Language: English
  • Developer: -

VARD 2

  • Description:VARD 2 is an interactive piece of software produced in Java designed to assist users of historical corpora in dealing with spelling variation particularly in EModE texts.
  • Group: Text Processing
  • Type: -
  • Subtype: spelling variations
  • License: Creative Commons Attribution-Noncommercial-Share Alike 2.0 UK: England & Wales License.
  • Language: Early Modern Englis but can be extended via plugins
  • Developer: Lancaster University

Virtual Transcription Laboratory

  • Description:Virtual Transcription Laboratory is Virtual Research Environment which works as a crowdsourcing platform for developing high quality textual representations of digital documents. It gives access to online OCR service and easy to use transcription editor. Images can be imported from various sources including direct import from digital libraries.
  • Group: Text Recognition
  • Type: Postcorrection
  • Subtype: -
  • License: free
  • Language: -
  • Developer: Poznań Supercomputing and Networking Center

WCRFT

  • Description:WCRFT (Wrocław CRF Tagger) is a simple morpho-syntactic tagger for Polish producing state-of-the-art results.The tagger combines tiered tagging conditional random fields (CRF) and features tailored for inflective languages written in WCCL.''The algorithm and code are inspired by Wrocław Memory-Based Tagger.WCRFT uses CRF++ API as the underlying CRF implementation. Tiered tagging is assumed. Grammatical class is disambiguated first then subsequent attributes (as defined in a config file) are taken care of. Each attribute is treated with a separate CRF and may be supplied a different set of feature templates.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: POS Tagger
  • License: GPL
  • Language: Polish
  • Developer: http://nlp.pwr.wroc.pl/redmine/

WCRFT (Wrocław CRF Tagger)

  • Description:WCRFT is a simple morpho-syntactic tagger for Polish producing''state-of-the-art results. The tagger combines tiered tagging''conditional random fields (CRF) and features tailored for inflective''languages written in WCCL. The algorithm and code are inspired by''Wrocław Memory-Based Tagger. WCRFT uses CRF++ API as the underlying CRF''implementation. Tiered tagging is assumed. Grammatical class is''disambiguated first then subsequent attributes (as defined in a config''file) are taken care of. Each attribute is treated with a separate CRF''and may be supplied a different set of feature templates.
  • Group: Text Processing
  • Type: -
  • Subtype: CRF tagger
  • License: unknown
  • Language: Polish
  • Developer: The WrocUT Language Technology Group G4.19

WMBT

  • Description:WMBT (Wrocław Memory-Based Tagger) is a simple morpho-syntactic tagger for Polish producing state-of-the-art results. WMBT uses TiMBL API as the underlying Memory-Based Learning implementation. The features for classification are generated by using WCCL
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: POS Tagger
  • License: Unclear
  • Language: Polish
  • Developer: http://nlp.pwr.wroc.pl/redmine/

WeOCR

  • Description:WeOCR is a platform for Web-enabled OCR (Optical Character Reader/Recognition) systems. It enables people to use character recognition over networks. A WeOCR server receives document images from users recognizes text in the images and returns recognition results to the users. WeOCR does not have its own character recognition engine. Instead it is intended to accommodate various existing character recognition engines.
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: Web service
  • License: ASL 2.0
  • Language: -
  • Developer: -

Word Spotting

  • Description:This tool provides an integrated GUI for indexing historical documents without an OCR engine. It works by segmenting documents into individual words and compiling a list of the most common words (keywords) in the text. Users are then asked to classify the keywords
  • Group: Text Recognition
  • Type: -
  • Subtype:
  • License: commercial
  • Language: Not applicable
  • Developer: National Center for Scientific Research (NCSR) "Demokritos"

WordFreak

  • Description:WordFreak is a java-based linguistic annotation tool designed to support human and automatic annotation of linguistic data as well as employ active-learning for human correction of automatically annotated data. Java based.
  • Group: Text Processing
  • Type: -
  • Subtype: annotation tool
  • License: Mozilla Public License 1.1 (MPL 1.1)
  • Language: -
  • Developer: Thomas Morton Jeremy LaCivita

Wordsnap OCR

  • Description:An app for OCR-based camera input on Android
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: GPLv3
  • Language: -
  • Developer: -

XMLtotext

  • Description:Perform various XML conversion into TXT.
  • Group: text processing
  • Type: language resources
  • Subtype: 0
  • License:
  • Language: n/a
  • Developer:


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: