The IMPACT Framework – From Tools to Workflows

Impact CoCDiscussions, Optical Character Recognition

This practical session started with the attendees introducing themselves and splitting up into 3 groups, so that each could work on a different set of tasks based on a Case Study.

Sven Schlarb at IMPACT/myGrid Hackathon

Sven Schlarb at IMPACT/myGrid Hackathon

Case Study:

A collection holder wants to reduce storage costs for his collections that
are currently available as TIFF master files. She/he heard that JPEG2000 is
a good candidate for storing digital master files, and she/he heard about
the efficiency of image compression when using lossy compression.

She/he knows that JPEG2000 compression can be “visually lossless”, so that
the compression is reversible, but she/he is still concerned about the
impact the JPEG2000 compression could have on OCR.

We suggest a Taverna workflow that creates an executable processing pipeline
for studying the results.

The workflow should have 1 TIFF image as input and a list of increasing
compression parameters which are used when encoding the image. The image
should then be decompressed before applying the OCR. Finally, the impact
of the compression on the OCR should be measured by comparing the original
OCR output to the OCR output of the compressed images.

IMPACT myGrid Taverna Hackathon

IMPACT myGrid Taverna Hackathon

The Three Groups:

Group 1

Use the toolwrapper for providing access to a JPEG2000 encoding/decoding tool:

Group 2

Use Taverna for creating the workflow:

Group 3

Use a Taverna beanshell for creating the Text comparison

  • commons-lang-2.4.jar (/home/<youruser>/.taverna-home/lib/commons-lang-2.4.jar)
Carl Wilson from the BL concentrates on Taverna

Carl Wilson from the BL concentrates on Taverna

The selection of groups has shown a definite preference for the more ‘user’ based tasks rather than ‘developer’ tasks, with 12 working on Group 1, 6 on Group 2 and only 3 on Group3.  However, quite a few attendees seemed happy to be involved in more than one group, or work in one, but support users in another.

General feeling is that this bodes well for tomorrow which has a more ‘practical’ based timetable.