On the 10th and 11th of April 2014 at the University of Alicante, the Succeed project held a hackathon whose aim was to look at improving the state-of-the-art open-source tools for the digitisation of textual content such as books and newspapers.
Over the two days, developers worked together in small groups to discuss, roadmap and plan the future development of existing tools. Some of the topics up for discussion were:
- How to train the Tesseract OCR engine.
- Creation of XSLT stylesheets for format conversion, e.g. hOCR, PAGE, FRXML.
- Debian package generation.
The hackathon provides a unique opportunity to meet developers involved in digitisation projects all over Europe.