IMPACT Members’ Meeting 2017

taller digitalDATeCH

Last May 31st, we celebrated the Annual Members’ Meeting of the IMPACT Centre of Competence.

The day started with the Executive Board meeting where relevant aspects of IMPACT were discussed, such as the action plan for the remaining 2017 and 2018. After this meeting, all attendees joined the board members and Francis Ballesteros, Director of IMPACT welcomed the new members that joined the Centre in 2016 and 1st semester of 2017:

  • Jouve
  • Europeana
  • SUB Göttingen
  • iFunPlay
  • Universidad de Guadalajara
  • Ateneu Barcelonès
  • Institut d’Histoire des Représentations et des Idées dans les Modernités (ihrim)
  • Stanford University Libraries
  • Staatsbibliothek zu Berlin
  • Swiss National Library

Francis also presented the management & dissemination activities and the financial report for 2016, and updated the members with the latest news on the Latin American expansion of IMPACT, led by Universidad de Guadalajara (México).

After that, Isabel Martínez, IMPACT Manager, presented to the attendees the results of the survey filled by our members, from which the action plan of IMPACT will be drafted. Isabel also presented the new dissemination materials (slides, flyers) of the Centre and the new version of our website, that will be soon launched to the general public.

Impact new web site

After the lunch break, our members took the floor in order to present their recent work. This is a very important exercise, since our members are able to be updated on the latest advances in digitisation made by their institutions and establish discussions, share ideas, best practices, etc.

The first one was Lotte Willms, from Koninklijke Bibliotheek, who presented the evaluation and postcorrection of OCR digitised newspapers at KB (a research project). The Delpher corpus is composed by digitised Dutch newspapers between the years 1618 and 1995 and has currently 11M pages. It is expected that the corpus in 2020 is composed by 20M pages. The project has two main objectives: insight into the quality of KB’s OCR and insight into automated methods of postcorrection. The results offer full searchable text and can be visited at www.delpher.nl.

Evaluation and post-correction of OCR of digitised historical newspapers from impact Centre of Competence

Gustavo Candela, from Fundación Biblioteca Virtual Miguel de Cervantes, updated us with their application of linked open data to a digital library. This advances are presented at data.cervantesvirtual.com. The first release of this linked open data website, based on RDA and FRBR, was launched in 2015, having as main objective the promotion of data sharing, interoperability, data re-use and the dissemination of best practices. The lessons learnt about the keys of success for the application of linked open data to libraries are the preprocessing of sources in order to minimise errors, the re-use of vocabularies (RDA & FRBR), the identification of access points to the data, the metadata enrichment and the increase of visibility making use of social media.

Then, it was the turn of Tomasz Parkola, from Poznań Supercomputing and Networking Center, to present their DInGO toolset, an integral toolset with 130 deployments and 2.5M of digital objects in 300 institutions. The DInGO toolset is composed by the tools dLibra, for digital libraries and repositories, dMuseion, for digital museums and galleries, dLab for digital process management and dArceo, for long-term preservation. The key characteristics of DInGO are the interoperability, the scalability and the flexibility. After introducing the toolset, Tomasz Parkola showed us some examples, case studies and a demo of DInGO.

Katrien Depuydt, from Instituut voor der Nedelandse Taal, presented their Nederlab project that runs between 2013 and June 2018. One of the main objectives of the project is the creation of a research environment for historians, linguists, literary scholars, etc. and it is based on a user-friendly and tool enriched web interface that provides different levels of access and diverse data in content, formats, text quality and metadata quality. Katrien showed us the different steps in corpus processing that include the acquisition, analysis, conversion, curation, OCR quality and enrichment. Then, she presented the case of Huygens ING corpus with the OCR of 450 volumes and text editions in TEI XML. The challenges of this corpus are OCR quality, parsing structure. the editorial matter vs. the original text and the metadata. The project can be visited at www.nederlab.nl.

Neil Fitzgerald, from The British Library, updated us on the digitisation programme and the digital scholarship at BL. First of all, Neil presented the vision of the BL by 2023 with the main objectives planned to be achieved, such as the access to BL digitised content in a single platform, the digitisation of major collections or the collaboration with others for the digitisation of relevant British cultural heritage. The vision is based in a 3-leg strategy: commercial strategy (public private partnership), partnership strategy (hybrid models) and collection strategy (open access models). Then Neil presented the major digitisation initiatives of the British Library in the present (2 Centuries of Indian Print, Hebrew Manuscripts Digitisation…); the recent improvements included in their Digital Library System; and the challenges they are facing through the project focus.

The second part of the presentation was focused on the digital scholarship and the programmes that includes such as the training programme, the 2017 Mozilla Global Sprint, the ICDAR 2017 Transcription competition, the Lib Crowds (crowdsourcing programme for the BL), the open cultural heritage datasets, etc., and the next steps to be performed.

Finally, Nele Gabriëls, from KU Leuven, presented the digitisation programme at her institution. The digitisation programme at KU Leuven has recently started and it’s on its way to be consolidated. The materials digitised are mainly documentary and the digitisation is made camera-based in house. Currently, the imaging lab has an output of up to 1000 images/day. We are proud to say that the first steps into OCR were made in the frame of the SUCCEED project. Nele presented the future steps of the digitisation at her institutions, they are based on a policy-based, open and innovative programme.

We can conclude, the annual meeting was a very fruitful day in which our members had the opportunity to share ideas to improve their work, working together for making digitisation better, faster and cheaper.