Layout analysis

Layout analysis is about automatically identifying structural elements in scanned documents by detecting the logical units that make up the layout of a page such as articles, headings, captions images or tables. With this information at hand, it is possible to generate a new better user experience when working with digitized documents. Newspapers for example can contain several articles on a single page that might not be thematically related (e.g. on the title page of a newspaper). By identifying individual articles it is possible to cluster these articles by topic, propose related article to the user and integrate these articles into existing content management systems. Additionally, article segmentation can support search by allowing users to search only within certain layout units such as image captions or headlines.

Compared to other automatic processing steps such as OCR, layout analysis is a rather difficult task to perform automatically. That is due to the fact that e.g. newspaper layouts can be quite complex or they may change over time. Therefore many layout analysis tools include a manual post-correction step in which they support users in the correction process

Here we will present tools that are able to automatically detect and reconstruct these structural elements from scanned documents.