Input

Input Image Formats

FineReader accepts a wide range of input image formats, among which we list: bmp, dcx, pcx, png, jpeg2000, jpeg, pdf, tiff, gif, djvu, jbig2, wdp, wic. It will not open images larger than 32512*32512 pixels. For dealing with other image formats, you can consult the SUCCEED tool list to find conversion utilities.

Supported Languages

FineReader reads documents in 188 languages (cf. finereader.abbyy.com/recognition_languages/) , of which 45 have dictionary support.

Limitations

The fact that your image format is supported and your language is implemented does not necessarily mean that your recognition results will be satisfactory. The main reasons for suboptimal results are

• Poor quality images, for instance low-resolution black and white images from old microfilms

• Degraded documents (warped, unclear printing, damaged, …)

• Font shapes unknown to the engine

• Your language may be listed as supported, but the actual language in your documents may be incompatible with the implemented language support, if it contains specific terminology, historical or regional language.

Extending Language Support

A limited amount of words can be added as user dictionary. There is unfortunately no utility to produce Finereader dictionaries from user word lists. A possible approach to implementation of language support is the External Dictionary  mechanism in the ABBYY SDK, for which we refer to a separate document .

Training Character Shapes

It is possible to train the engine for unknown or unusual character shapes.  The engine and the desktop user interface have the option to train a “user pattern” during recognition. User patterns may be saved and loaded for recognition jobs.

Abbyy_abbyy_capture

This option may improve recognition of unusual character shapes, but it is impossible to reach the quality that would be obtained by full training of the engine, which can only be done by ABBYY. One should bear in mind that the applicability of the trained shapes is limited to images with not only the same font shapes, but also the same image parameters (resolution, quality, colour depth). For a really robust extension to different font shapes, you have to contact ABBYY (http://www.abbyy.com/support/).

For a comparison between the Finereader and the Tesseract OCR trainability, cf. for instance the case study  http://lib.psnc.pl/dlibra/doccontent?id=358 , included with the training materials.