Use this toolkit when building your own OCR workflow out of various tools from various vendors. If you need a start-to-end solution, use the ABBYY FineReader Engine.
This part of the toolkit was developed and released to support the ABBYY FineReader Engine 10 for Windows.
Binarisation is the transformation of a colour or greyscale image into a black and white image. Image binarisation is applied before OCR and intended to emphasise the difference between text and background content, since the contrast between black and white allows an OCR engine to more easily distinguish significant text detail from the background.
There are different types and levels of binarisation that can be applied, and not all of them will be appropriate for every image: careless binarisation can, for instance, effectively delete softly inked text from a digital image, making it unreadable.
A 24-bit colour image of a page with its binarised equivalent. Note the greater contrast between text and background in the binarised image: this makes it easier for an OCR engine to pick out and identify text content.
- IMPACT deliverable IMPACT D-TR1 Image Enhancement Toolkit.pdf (December 2011)
The following screencast explains the use of binarisation in the production of OCR documents, and introduces the IMPACT Project’s Binarisation Tool – a modular and adaptable toolkit that can be used to find the best type of binarisation for particular works, and apply it across a collection.