Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

286 results

Tools

VARD 2

  • Description:VARD 2 is an interactive piece of software produced in Java designed to assist users of historical corpora in dealing with spelling variation particularly in EModE texts.
  • Group: Text Processing
  • Type: -
  • Subtype: spelling variations
  • License: Creative Commons Attribution-Noncommercial-Share Alike 2.0 UK: England & Wales License.
  • Language: Early Modern Englis but can be extended via plugins
  • Developer: Lancaster University

Virtual Transcription Laboratory

  • Description:Virtual Transcription Laboratory is Virtual Research Environment which works as a crowdsourcing platform for developing high quality textual representations of digital documents. It gives access to online OCR service and easy to use transcription editor. Images can be imported from various sources including direct import from digital libraries.
  • Group: Text Recognition
  • Type: Postcorrection
  • Subtype: -
  • License: free
  • Language: -
  • Developer: Poznań Supercomputing and Networking Center

WCRFT

  • Description:WCRFT (Wrocław CRF Tagger) is a simple morpho-syntactic tagger for Polish producing state-of-the-art results.The tagger combines tiered tagging conditional random fields (CRF) and features tailored for inflective languages written in WCCL.''The algorithm and code are inspired by Wrocław Memory-Based Tagger.WCRFT uses CRF++ API as the underlying CRF implementation. Tiered tagging is assumed. Grammatical class is disambiguated first then subsequent attributes (as defined in a config file) are taken care of. Each attribute is treated with a separate CRF and may be supplied a different set of feature templates.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: POS Tagger
  • License: GPL
  • Language: Polish
  • Developer: http://nlp.pwr.wroc.pl/redmine/

WCRFT (Wrocław CRF Tagger)

  • Description:WCRFT is a simple morpho-syntactic tagger for Polish producing''state-of-the-art results. The tagger combines tiered tagging''conditional random fields (CRF) and features tailored for inflective''languages written in WCCL. The algorithm and code are inspired by''Wrocław Memory-Based Tagger. WCRFT uses CRF++ API as the underlying CRF''implementation. Tiered tagging is assumed. Grammatical class is''disambiguated first then subsequent attributes (as defined in a config''file) are taken care of. Each attribute is treated with a separate CRF''and may be supplied a different set of feature templates.
  • Group: Text Processing
  • Type: -
  • Subtype: CRF tagger
  • License: unknown
  • Language: Polish
  • Developer: The WrocUT Language Technology Group G4.19

WMBT

  • Description:WMBT (Wrocław Memory-Based Tagger) is a simple morpho-syntactic tagger for Polish producing state-of-the-art results. WMBT uses TiMBL API as the underlying Memory-Based Learning implementation. The features for classification are generated by using WCCL
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: POS Tagger
  • License: Unclear
  • Language: Polish
  • Developer: http://nlp.pwr.wroc.pl/redmine/

WeOCR

  • Description:WeOCR is a platform for Web-enabled OCR (Optical Character Reader/Recognition) systems. It enables people to use character recognition over networks. A WeOCR server receives document images from users recognizes text in the images and returns recognition results to the users. WeOCR does not have its own character recognition engine. Instead it is intended to accommodate various existing character recognition engines.
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: Web service
  • License: ASL 2.0
  • Language: -
  • Developer: -

Word Spotting

  • Description:This tool provides an integrated GUI for indexing historical documents without an OCR engine. It works by segmenting documents into individual words and compiling a list of the most common words (keywords) in the text. Users are then asked to classify the keywords
  • Group: Text Recognition
  • Type: -
  • Subtype:
  • License: commercial
  • Language: Not applicable
  • Developer: National Center for Scientific Research (NCSR) "Demokritos"

WordFreak

  • Description:WordFreak is a java-based linguistic annotation tool designed to support human and automatic annotation of linguistic data as well as employ active-learning for human correction of automatically annotated data. Java based.
  • Group: Text Processing
  • Type: -
  • Subtype: annotation tool
  • License: Mozilla Public License 1.1 (MPL 1.1)
  • Language: -
  • Developer: Thomas Morton Jeremy LaCivita

Wordsnap OCR

  • Description:An app for OCR-based camera input on Android
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: GPLv3
  • Language: -
  • Developer: -

XMLtotext

  • Description:Perform various XML conversion into TXT.
  • Group: text processing
  • Type: language resources
  • Subtype: 0
  • License:
  • Language: n/a
  • Developer:

XPLab

  • Description:XPLAB tries to recognize patterns in a scanned document image by trained templates stored in a database. The main phases are training recognition and maintenance. The user can switch easily between all phases in the same session. Some effort is made to simplify the training phase which is the most time consuming part of interactivity.
  • Group: Text Processing
  • Type: Core Text Recognition
  • Subtype: -
  • License: GPL
  • Language: -
  • Developer: -

Xerox

  • Description:This service will tell you the language your document is written in. Language identification is often the first necessary step in a whole line of document processing.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Language Identification
  • License: commercial
  • Language: 47
  • Developer: http://open.xerox.com/


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: