NCSR Border Detection and Removal


Compare with similar tools:

Scenario


This tool detects and removes noisy black borders as well as noisy text regions. Moreover, it detects the optimal page frames of double page document images.

Abstract


A scanned document will tend to contain graphical information that a researcher will not need, in particular the blank areas around the text body. The border removal process aims at enhancing document images by automatically detecting and cutting out noisy black borders as well as noisy text regions from neighbouring pages.

It is based on projection profiles combined with a connected component labeling process and signal cross-correlation in order to verify the detected text areas.

Removing these areas (commonly known as border removal) has the effect of making such t exts easier to read on a screen and also reduces the overall file size – making it easier for an institution to store, and quicker to transfer remotely to a researcher.

The following screen cast introduces the theory behind border removal, as well the IMPACT border removal toolkit: a modular and adaptable programme.

Border removal Input 1

Border removal Input 1

Border removal Output 1

Border removal Output 1

Border removal Input 2

Border removal Input 2

Border removal Output 2

Border removal Output 2

Border removal Input 3

Border removal Input 3

Border removal Output 3

Border removal Output 3

Publications

Availability

For information on licencing, please contact NCSR IMPACT group

OCR Post-correction and Enrichment