Abbyy Block Segmentation

Produced by: ABBYY

Compare with similar tools:


Segmentation is a major function in an OCR system. During this step, the main document components (text / graphic areas, text lines, words and characters or glyphs) are automatically extracted.

Traditionally, segmenting historical machine-printed documents has been tackled by the use of techniques that are mainly designed for contemporary documents.

As a result, several problems inherent in historical documents such as general low quality of the original volume; complex, dense and irregular layouts; artefacts not completely corrected during pre-processing (noise between characters, ink diffusion and text skew) seriously affect the segmentation and, consequently, the recognition accuracy of OCR. Furthermore, volume-specific rules are usually used for segmenting historical machine-printed documents. In the context of a mass digitisation workflow, this is unworkable and has necessitated the development of new approaches.

IMPACT introduces novel hierarchical segmentation models that allow the discrete problems of text block, text line, word and character segmentation to be addressed separately while at the same time allowing for interplay between all levels.


Before characters and words can be recognised by an OCR engine, the print space of the image has to be identified, and from there paragraphs and lines. The process is known as segmentation, and the following screencast introduces both the underlying concept and the IMPACT Segmentation Toolkit. The new IMPACT Block segmentation and classification toolkit is released on the basis of the ABBYY FineReader Engine 10 for Windows.


Comparison of old (FR9) and new (FR10) segmentation.

  • Old segmentation (FR9)

    Old segmentation (FR9)

  • New segmentation (FR10)

    New segmentation (FR10)

  • Old segmentation (FR9)

    Old segmentation (FR9)

  • New segmentation (FR10)

    New segmentation (FR10)



This tool is under ABBY FineReader Engine 10 commercial licence. For further information on licencing, please contact ABBYY’s European Office

OCR Post-correction and Enrichment

Related content

Tool for text digitisation

ABBYY FineReader Engine 10

The new SDK FineReader Engine 10, which was released in September 2010, contains a variety of technological improvements in terms of processing speed, recognition accuracy, simplification of development and new export formats.

ABBYY FineReader Engine 10

Succeed training materials

Abbyy FineReader

Abbyy FineReader Engine 10

ABBYY FineReader is a widely used, well-documented commercial product for text recognition in images.