Tools for text digitisation

More than
250
state-of-the-art tools for text digitisation.

283 results

Tools

WCRFT

  • Description:WCRFT (Wrocław CRF Tagger) is a simple morpho-syntactic tagger for Polish producing state-of-the-art results.The tagger combines tiered tagging conditional random fields (CRF) and features tailored for inflective languages written in WCCL.''The algorithm and code are inspired by Wrocław Memory-Based Tagger.WCRFT uses CRF++ API as the underlying CRF implementation. Tiered tagging is assumed. Grammatical class is disambiguated first then subsequent attributes (as defined in a config file) are taken care of. Each attribute is treated with a separate CRF and may be supplied a different set of feature templates.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: POS Tagger
  • License: GPL
  • Language: Polish
  • Developer: http://nlp.pwr.wroc.pl/redmine/

WCRFT (Wrocław CRF Tagger)

  • Description:WCRFT is a simple morpho-syntactic tagger for Polish producing''state-of-the-art results. The tagger combines tiered tagging''conditional random fields (CRF) and features tailored for inflective''languages written in WCCL. The algorithm and code are inspired by''Wrocław Memory-Based Tagger. WCRFT uses CRF++ API as the underlying CRF''implementation. Tiered tagging is assumed. Grammatical class is''disambiguated first then subsequent attributes (as defined in a config''file) are taken care of. Each attribute is treated with a separate CRF''and may be supplied a different set of feature templates.
  • Group: Text Processing
  • Type: -
  • Subtype: CRF tagger
  • License: unknown
  • Language: Polish
  • Developer: The WrocUT Language Technology Group G4.19

WMBT

  • Description:WMBT (Wrocław Memory-Based Tagger) is a simple morpho-syntactic tagger for Polish producing state-of-the-art results. WMBT uses TiMBL API as the underlying Memory-Based Learning implementation. The features for classification are generated by using WCCL
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: POS Tagger
  • License: Unclear
  • Language: Polish
  • Developer: http://nlp.pwr.wroc.pl/redmine/

WeOCR

  • Description:WeOCR is a platform for Web-enabled OCR (Optical Character Reader/Recognition) systems. It enables people to use character recognition over networks. A WeOCR server receives document images from users recognizes text in the images and returns recognition results to the users. WeOCR does not have its own character recognition engine. Instead it is intended to accommodate various existing character recognition engines.
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: Web service
  • License: ASL 2.0
  • Language: -
  • Developer: -

Word Spotting

  • Description:This tool provides an integrated GUI for indexing historical documents without an OCR engine. It works by segmenting documents into individual words and compiling a list of the most common words (keywords) in the text. Users are then asked to classify the keywords
  • Group: Text Recognition
  • Type: -
  • Subtype:
  • License: commercial
  • Language: Not applicable
  • Developer: National Center for Scientific Research (NCSR) "Demokritos"

WordFreak

  • Description:WordFreak is a java-based linguistic annotation tool designed to support human and automatic annotation of linguistic data as well as employ active-learning for human correction of automatically annotated data. Java based.
  • Group: Text Processing
  • Type: -
  • Subtype: annotation tool
  • License: Mozilla Public License 1.1 (MPL 1.1)
  • Language: -
  • Developer: Thomas Morton Jeremy LaCivita

Wordsnap OCR

  • Description:An app for OCR-based camera input on Android
  • Group: Text Recognition
  • Type: Core Text Recognition
  • Subtype: -
  • License: GPLv3
  • Language: -
  • Developer: -

XMLtotext

  • Description:Perform various XML conversion into TXT.
  • Group: text processing
  • Type: language resources
  • Subtype: 0
  • License:
  • Language: n/a
  • Developer:

XPLab

  • Description:XPLAB tries to recognize patterns in a scanned document image by trained templates stored in a database. The main phases are training recognition and maintenance. The user can switch easily between all phases in the same session. Some effort is made to simplify the training phase which is the most time consuming part of interactivity.
  • Group: Text Processing
  • Type: Core Text Recognition
  • Subtype: -
  • License: GPL
  • Language: -
  • Developer: -

Xerox

  • Description:This service will tell you the language your document is written in. Language identification is often the first necessary step in a whole line of document processing.
  • Group: Text Processing
  • Type: NLP Tools
  • Subtype: Language Identification
  • License: commercial
  • Language: 47
  • Developer: http://open.xerox.com/

Xsltproc

  • Description:Xsltproc is a command line tool for applying XSLT stylesheets to XML documents. It is part of libxslt, the XSLT C library for GNOME.
  • Group: text processing
  • Type: language resources
  • Subtype: 0
  • License:
  • Language: n/a
  • Developer:

abbot

  • Description:Abbot is a tool for undertaking large-scale conversion of XML document collections in order to make them interoperable with one another. Java technology.
  • Group: metadata processing
  • Type: nlp tools
  • Subtype: format conversion (xml)
  • License:
  • Language: n/a
  • Developer: center for digital research in the humanities at the university of nebraska-lincoln


Would you like to add any tool?

Registered users can add new tools through a simple form login or register.

Search or filter tools

Group:

Type:

Subtype:

In demonstrator platform: