Compare with similar tools:
- Group: Text Recognition
This tool provides an integrated GUI for indexing historical documents without an OCR engine. It allows searching the database for instances of a query keyword using three different methods:
- Select the query from a predefined list of keywords.
- Define the query by an example.
- Type the query as text.
Historical printed documents contain a vast amount of valuable information. A robust indexing of these documents is essential for quick and efficient use of valuable historical collections. Traditionally, this indexing has been done by means of OCR.
OCR produces its best results from well-printed, modern documents. But historical documents contain a range of effects that can reduce accuracy of recognition: from poor paper quality, poor typesetting, damage or degradation of the original paper source, and text skew or warping due to age or humidity.
The IMPACT Word Spotting tool represents a new approach to overcome these difficulties. It works by segmenting documents into individual words and compiling a list of the most common words (keywords) in the text. Users are then asked to classify the keywords by three possible methods:
- by using a predefined keywords list
- by providing an image example as a query
- by typing the query as plain text
The application provides full functionality for the organisation, management and visualisation of a complete document collection.
- IMPACT deliverable D-TR4.3 Word Spotting Prototype (February 2011)
- Kesidis, A, E. Galiotou, B. Gatos and I. Pratikakis, “A word spotting framework for historical machine printed documents.” IJDAR International Journal on Document Analysis and Recognition.
- Kesidis, A. “Efficient Cut-off Threshold Estimation for Word Spotting Applications.” ICDAR2011, 18-21 September, Beijing, China.
- Colutto, S. and B. Gatos. “Efficient Word Recognition Using A Pixel-Based Dissimilarity Measure.” ICDAR2011, 18-21 September, Beijing, China.
- Gatos, B..IMPACT Tools Developed by NCSR. IMPACT Final Conference 2011, 24-25 October, London, UK