Cutouts and page-generator (Tesseract OCR customization)

Author: Adam Dudczak (PSNC)

Tesseract  ( is a well-known open-source OCR application, apart from other things it features layout analysis and training capabilities.  Because Tesseract is a command-line tool it is very handy to have it as part of larger digitisation workflow. This document describes how to create custom recognition profile for a specific kind of documents using web application called Cutouts ( and command line tools called page-generator (