Succeed Project Tools Evaluation: Wielkopolska Digital Library

Silvia GarcíaNews, Succeed

One of the main objectives of the Succeed project is the evaluation and take-up of tools in a productive environment. To meet this objective, the four libraries of the Succeed consortium and nine external libraries (see the full list of libraries here) selected a minimum of two digitisation tools to perform the evaluation.

The Wielkopolska Digital Library selected three tools, the first two tools for Image Enhancement (ImageMagick and ScanTailor) and the third one, JHove, a joint project to develop an extensible framework for format validation.

In the following lines, the Wielkopolska Digital Library provides an overview on the results of their evaluation:

1- Might the program prove useful for mass digitization?

  • ImageMagick:

In our tests, ImageMagick was used to convert the initial TIFF files to JPG format in order to reduce the sizes of the files undergoing further conversion (to DjVu and PDF). However, the significant decrease in the sizes of the input files (JPG) undergoing further conversion (to DjVu and PDF) did not lead to a decrease in the sizes of the output files. Therefore the program did not provide the benefits expected.

However, since the program offers a much greater number of functions than just conversion to JPG format, and these functions can be accessed from the command line or through APIs available for many languages (including Perl, C, C++, Python and PHP), the program certainly may prove useful for mass digitization.

  • Scan Tailor:

This tool is useful in the process of mass digitization.

Conversion using Scan Tailor improves the visual quality of the files (horizontal alignment, addition of margins, etc.), depending on which conversion parameters were configured.

The program will be used in the production process of digitization and publication to improve the visual quality of the processed materials.

  • JHOVE:

This tool is very useful in the process of mass digitization.

In our tests, the program was used for:

1. Checking compliance of the input files with the TIFF format, based on:

  • information on fields (tags) in the TIFF files;
  • information on validation results.

2. Checking the values of selected fields (tags):

  • resolution (min. 300 DPI) – tags 282, 283 and 296
  • compression – tag 259 (permissible values: 1, 5)
  • colour space – tag 262 (permissible values: 0, 1, 2)
  • FillOrder – tag 266 (permissible value: 1)

and reporting deviations from accepted values.

In spite of the long time required for processing, the program will be used in the production process of digitization and publication, for the validation of master files according to selected criteria before the start of processing.

2-Does use of the program significantly increase the scan processing time?

The processing time is increased significantly when the source files are checked using JHOVE2. Depending on the number and size of files, the time may be as much as seven times longer than in the case of processing involving only conversion to DjVu.

In the case of Scan Tailor and ImageMagick, the time is twice as long as in the case of processing involving only conversion to DjVu.

3- Is special training required for the librarian using the program?

Processing is fully automatic and runs as a batch process on the server. In case of processing errors, the librarian (operator, editor) receives notification of the need to rescan the files using set parameters and to send the files to the server. Therefore the librarian requires no special training beyond that which is required for the everyday work of a scanner operator and editor in a digital library.

4- Do you intend to use this program at WDL (Wielkopolska Digital Library http://www.wbc.poznan.pl/dlibra)?

Following the tests, we took a decision to use the following programs in the production process of digitization and publication: JHOVE2 (for validation of master files according to selected criteria), and Scan Tailor (for improvement of the visual quality of the processed materials).

ImageMagick did not finally meet our expectations.

5- What are your greatest concerns?

We were concerned that the use of JHOVE2 might be somewhat “overblown” for our needs. Our tests show that the conversion of source files using Scan Tailor before final conversion to the DjVu presentation format eliminates any errors in the TIFF files which might prevent further processing.  The question therefore arises as to whether it is appropriate to use JHOVE2, thereby significantly increasing the processing time, given that the files undergoing conversion to DjVu are in any case pre-processed and free of the errors which previously could hold up the entire process.

On the other hand, checking of the source files provides an assurance that the archived master files conform to the standard and that there will not be any problems with them in the future when it becomes necessary to use them for other purposes.

Another problem is errors in the JHOVE2 software – some of the processed files generate exceptions, and the program crashes. We have sent a query in this matter to the program’s creators, attaching samples of files whose processing led to such errors.

We also have doubts about the use of Scan Tailor in batch mode. Batch processing using Scan Tailor needs to be applied with care – the conversion parameters need to be carefully chosen and tested, and the output files should themselves undergo an additional check, because the results of the program’s operation with fixed parameters on differing input files may not be as expected.

6-Does the program require additional or special devices?

Each of the programs was easy to install, configure, and integrate with the system currently used.  There are no special requirements above those that are met by the hardware and software currently in use.

7-Are you aware of any better solutions than this program?

Before the programs were chosen for the tests, alternative solutions were sought and analysed.  A basic requirement was the ability to run the programs from the command line or through APIs available for selected programming languages, with Perl preferred. The three programs chosen were those which at the time appeared to be the best for the selected uses.

8-What was the reason for using this program?

The system currently used for automated conversion and publication of documents in the digital library – a self-developed system, using a set of Perl scripts – does not address the issues of verification of the processed files, minimization of output file sizes, and improvement of their visual quality. The idea arose of making use of available free tools implementing appropriate functionalities. Three programs were chosen, and it was decided to test their usefulness in terms of:

a) verification of compliance of input files with the TIFF format – JHOVE2;

b) conversion to JPG format to reduce the size of the output files – ImageMagick;

c) improvement of visual quality of output files – Scan Tailor.