Tutorial at TPDL 2013: State-of-the-art tools for text digitisation


The goal of this tutorial (organised by the Succeed project within the TPDL Conference, http://www.tpdl2013.info/ in Valletta, Malta, 22-26 September 2013) is to provide a practical experience introducing participants to a number of state-of-the-art tools in digitisation and text processing which have been developed in recent research projects. The tutorial will focus on hands-on demonstration and on the testing of the tools in real-life situations, even those provided by the participants. The learning objectives are:

  • Gain practical insight of the most recent developments in text digitisation techniques.
  • Identify strengths and usability weaknesses of existing tools.
  • Reach a better knowledge on the effect of new tools and resources on the productivity.
  • Discuss the requirements and effects of their integration in the production workflow.

This tutorial will give participants a unique opportunity to gather information about tools created in research projects, to test and evaluate their usability and to find out how to benefit from the usage of these tools. Conversely, researchers will benefit from practitioner comments and suggestions.

Further information about the tools demonstrated and the tutorial program is available at http://succeed-project.eu/wiki/index.php/TPDL_Tutorial_State-of-the-art_tools_for_text_digitisation

Target audience

The tutorial is intended for librarians, archivists or museum staff involved in text digitisation. Attendees might have basic knowledge in digitisation strategies.


Bob Boelhouwer (Instituut voor Nederlandse Lexicologie – INL) He is a computational linguist. He holds a PhD: From letter strings to phonemes: the role of orthographic context in phonological recoding, 1998. Relevant experience e.g.: integration of lexical resources, and implementing tools to explore lexical resources.

Adam Dudczak (Poznań Supercomputing and Networking Centre – PSNC) Holds a Master degree in Computer Science, he is a member of Digital Libraries Team a division of PSNC. He is leading the development of Virtual Transcription Laboratory (VTL). Portal integrating custom cloud-based OCR with handy editing interface which allows for crowdsourcing of text correction. Apart from this Adam is working on development of the e”learning materials related to digital libraries and digitisation, created during the ACCESS IT and ACCESS IT plus projects. These courses include e-learning materials and a dedicated operating system — DigitLab which integrates free and widely known tools useful in digitisation of various kinds of documents (including textual objects). Adam is also an experienced trainer, who had a chance to work with various communities in the area of digital libraries and software development.

Sebastian Kirch (Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme – IAIS) Received his diploma in information technology from the University of Marburg in 2009, majoring in the field of distributed systems. At Fraunhofer IAIS he is working on the design and implementation of distributed software applications for document analysis and enrichment. Sebastian has been active in several research projects; most recently as project leader in the BMBF-funded (German Federal Ministry of Education and Research) project “MediaGrid”. http://s.fhg.de/kirch

Date and Venue

The venue for the tutorials is the main conference hotel (Hotel Excelsior) in Valletta (information on the venue will be posted soon on http://www.tpdl2013.info/. The date of the tutorial is on 22nd September, 2013.

Overview and slides

An overview of the tutorial is posted on the Impact Centre of Competence blog at http://blog.dev.digitisation.eu/blog/state-of-the-art-tools-for-text-digitisation-tutorial-tpdl-2013/.

The slides can be downloaded from http://succeed-project.eu/event/tpdl-tutorial-state-art-tools-text-digitisation-0.