Compare with similar tools:


Scenario


This tool provides an integrated GUI for indexing historical documents without an OCR engine. It allows searching the database for instances of a query keyword using three different methods:
  1. Select the query from a predefined list of keywords.
  2. Define the query by an example.
  3. Type the query as text.

Abstract


Historical printed documents contain a vast amount of valuable information. A robust indexing of these documents is essential for quick and efficient use of valuable historical collections. Traditionally, this indexing has been done by means of OCR.

OCR produces its best results from well-printed, modern documents. But historical documents contain a range of effects that can reduce accuracy of recognition: from poor paper quality, poor typesetting, damage or degradation of the original paper source, and text skew or warping due to age or humidity.

The IMPACT Word Spotting tool represents a new approach to overcome these difficulties. It works by segmenting documents into individual words and compiling a list of the most common words (keywords) in the text. Users are then asked to classify the keywords by three possible methods:

  • by using a predefined keywords list
  • by providing an image example as a query
  • by typing the query as plain text

The application provides full functionality for the organisation, management and visualisation of a complete document collection.

Figure 1 - Nearest estimated words from specified keyword

Figure 1 – Nearest estimated words from specified keyword


Figure 2 - Query by example

Figure 2 – Query by example

Publications

Availability

For information on licencing, please contact NCSR IMPACT group

OCR Post-correction and Enrichment