The Question & Answer Session

Impact CoCDiscussions

Better late than never, a short summary of the question and answer session. NOT a literal transcript. Unfortunately, Sven Schlarb couldn’t take part in the session.

Question from the audience: Is the main aim of IMPACT to provide better OCR results for Fraktur / Gothic texts, or historical language in general?

Answer by Aly Conteh: better OCR results for Fraktur / Gothic scripts are only one field that IMPACT wants to make an improvement in. In general, IMPACT is about improving text recognition software and historical language research, based on the material that the partner libraries provide, that is what they have already digitised or want to digitise in the near future.

Question from the audience: Regarding the encoding of text, what are IMPACT’s thoughts on TEI?

Answer by Günter Mühlberger: For now, the focus is on METS/ALTO, but tools like the Functional Extension Parser (FEP) provide alternative means of document structure from which it would be possible to create benchmark structures that can be used to form stylesheets for TEI encoding.

Additional remark by Katrien Depuydt: For the lexicon work in IMPACT, TEI is used.

Question from the audience: IMPACT is focused on mass digitisation, but a lot of the tools shown require a lot of interaction. How does that fit together?

Answer by Aly Conteh: Some tools will always require human interaction or are specifically designed around it, like the collaborative correction tool CONCERT that was shown. In general it can be said that until now, the focus was to get the tools running, but now that this is done, the focus of the project is definitely to bring performance to the throughput level required for mass digitisation.

Question from the audience: When and how (by what licensing model) will non-IMPACT members be able to use the project’s tools?

Answer by Aly Conteh: At the end of the project in 2011, tools will be available in a variety of ways and with different licensing models. For example, Abbyy’s findings within IMPACT will be built into the next Fine Reader engine. Other tools may become open source, be made available commercially or in a number of different ways. In general it can be said that the question of licensing is already urgently discussed within the project.

Question from the audience: Why does IMPACT rely exclusively on the proprietary Abbyy OCR?

Answer from Michael Fuchs: The IMPACT framework is designed to be open. The default mode is with Abbyy FineReader, but users will be able to plug their preferred OCR engines and dictionaries into the system. The reason that Abbyy is part of IMPACT is just that it is the company with the most experience in Gothic script OCR.

Additional answer from Aly Conteh: IBM is also developing an OCR solution as part of the project. Also, some of the demonstrator that will be set up to show the tools in action will use OCRopus, for example.

Final remark by Michael Fuchs: OCR of historical texts is indeed like climbing the Mt Everest, compared to the ‘easy walk’ of OCRing contemporary texts. Like with a lot of projects, reaching 80% of your goal is easy, but the last 20% take the most effort.

Niall Anderson, Bl + Mark-Oliver Fischer, BSB