The Functional Extension Parser – a rule-based system for flexible structural analysis

Impact CoCDiscussions

Lukas Gander of Universitäts- und Landesbibliothek Tirol (University and Regional Library Tyrol) outlines the concept behind the Functional Extension Parser: using an OCR engine’s output to create a structural map of a page or volume.  OCR engines capture much more information than simple text: for instance, they contain information about text type and position.  The Functional Extension Parser (FEP) will spot if, say, numerical values appear repeatedly at the bottom of a page and tag them as page numbers.  Similar with Table of Content, chapter headings, indices and formulae.  The FEP does this by the application of rules that have been designed to model a human’s intuitive understanding of book structures.

[slideshare id=4138332&doc=bratislavaws-gander-uibk-thefunctionalextensionparser-100518090543-phpapp02]

One of the key potential benefits of the FEP will be in e-publishing, because the information it gathers about the structure of pages will include information about the print space and margins of a page – allowing a print to be easily made from a digital version.

Lukas Gander didn’t want his presentation to be filmed, so there is no video of this talk.

Niall Anderson, BL + Mark-Oliver Fischer, BSB