The graphical user interface of the Korrektor consists of three views displayed in individual windows: the Main Window (Figure 1), used for editing metadata, article and text information, the Word Highlighter (Figure 2), displaying the currently selected word in the scanned document, and the Issue Viewer (Figure 3), displaying the currently loaded issue.
The following subsections give a brief overview of the individual views. The most common correction functions are explained in section 4.
The Navigation Bar offers basic features like file operations (File), editing operations (Edit) and additional views (View).
The Metadata View (issue or page) shows information about the currently opened XML file. Upon opening a file containing a full issue, the following fields are available:
- Title: Newspaper title
- Date: Issue date
- Issue: Issue number
- Publisher: Newspaper publisher
- Edition: Newspaper edition
- Pages: Number of pages
In contrast, if the opened XML file contains only a single page, the following fields are available:
- Category: page category (e.g. title page, sports, politics)
- Date: Issue date for this page
- Page: Page number in issue
- Source: Page source (e.g. newspaper title)
All field values can be updated or filled in.
The Article View shows a list of the currently detected articles. The color column indicates the color using which the articles are visualized in the Issue Viewer window. If an article contains a title block, the contents of this block is visible in the title column. The pages column gives an overview of the pages where each article can be found (relevant if an article spans more than one page). By choosing one of the articles via a left mouse click, the article is loaded into the Article Editor and can be edited. The context menu (right click) offers two options:
- Edit Article: Edit an article in the Article Editor
- Remove Article: Remove an article and release all its associated blocks
The Article Editor is composed of three main components: article header, text editor panel and article footer. If either a whole article or a single block belonging to an article is selected, the frame of the text editor will be drawn in the color of the respective article. In case the currently selected block has not yet been assigned to any article, the frame of the text editor will be painted in grey. The article header and footer areas contain metadata about the article. Similarly as before, if any single block belonging to an article is selected, the metadata shown will be that of the respective article. For unassigned blocks, all fields within the article header and footer will be empty.
The article header starts with four read-only fields which display the text content of the regions marked as roof title, title, subtitle and byline. For any of the aforementioned fields, a mouse click on the field will result in the selection of the respective block within the article. Its text will then be automatically shown in the text editor and can be freely modified. The words marked as unsure by the OCR process are highlighted in red. If at one moment a single article block is selected (and not a whole article), then the button “Edit article…” is activated. By clicking this button, one may switch to the editing of the whole article. This is useful for example for quickly switching to whole-article editing right after correcting the text of its titles and its byline.
Below the title fields there are several editable metadata fields, namely Author, Category, Location andDate/Time. The field Category is represented as a combo box and may only be assigned a value out of a preset list of categories.
The text editor area allows the direct editing of the article text. A number of tools exist which are meant to facilitate an efficient text editing:
Word correction tool:
Words which have been wrongly recognized by the OCR process may be corrected on an individual basis. This is an important observation, because it means that it is not allowed to delete or add entire words and word breaks. The reason for this restriction is simple: deleting or adding entire words may cause the program to be unable to ensure a two-way correspondence between the remaining/ newly-added words and their exact location in the document image.
The word at the current cursor position within the text editor will be automatically marked in red in theWord Highlighter window, which shows an excerpt of the original document image. The automatic updating of the Word Highlighter window is meant to ensure that a fluent correction of the article text is possible at a glance. Analogously, the content of the Issue Viewer pane will also be automatically updated to reflect the cursor position and will do so by marking the block containing the current word with a wide frame. CTRL + left mouse clicking a word within the text editor will take the user directly to its corresponding block in the Issue Viewer. This is helpful for example for quickly finding small blocks (which would otherwise be hard to see at lower magnification settings) in the Issue Viewer).
A right click with the mouse within the text editor area opens a context menu. The context menu contains several block editing functions which are described in more detail in section 3.3 (Issue Viewer – Context menu). Note that the function Delete Block is deactivated on purpose in order to prevent accidental deletions of entire text blocks (deleting blocks can thus only be performed in the Issue Viewer).
At the bottom of the Article Editor pane two tables may be found: the upper one contains a listing of allfoot notes attached to the article, whereas the lower table contains the list of all attachments and their corresponding caption regions. All non-text components of an article (e.g. photographs, drawings) are denoted as attachments. The possible block types are described in section Issue Viewer.
The Word Highlighter view pane shows a small section of the document image. Its content is synchronized with the text editor pane as described in the previous section. In addition to the synchronization feature, the current word is highlighted in red for better visibility and faster editing.
The Issue Viewer shows the original document images. On top of the document image the text blocks found by the automatic segmentation/ manually created by the user are displayed semi-transparently in the color of the article which they were assigned to. Unassigned text blocks are depicted using hollow rectangles. Editing functions are available via left mouse click (multi-block selection), right mouse click (opening a context menu) and via drag and drop (assigning blocks to articles).
The current zoom level for the document image can be adjusted between 10% and 100% via the drop box in the upper left corner of the Issue Viewer. Note that manually inputting different zoom level values than those from the list of presets is also possible. This can be done via a left mouse click on the zoom level box and then typing the desired values via the keyboard, followed by the CR key. The Issue Viewer panel also allows the increasing or decreasing of the zoom level in 10% increments using the keys “+” and “-“, respectively. By pressing the “w” key, the zoom level will be automatically adjusted such that the width of the document corresponds exactly to the current width of the Issue Viewerwindow. Analogously, the “h” key has the effect of resizing the document image such that its height corresponds exactly to the height of the Issue Viewer window.
Page Segmentation Display Panel
The page blocks discovered by the automated page segmentation process or manually created by the user are displayed via semi-transparent rectangular frames. Depending on the status of each block, its background color will change accordingly as follows:
- blue: Blocks which have not yet been assigned to an article.
- non-blue: Blocks which have been assigned to an article. Their background color corresponds to the color of the containing article. The number in the upper left corner of the block represents the position of the block in the article’s reading order. Non-text blocks, captions and text boxes are numbered separately, as they can generally be read in any order. The ordering of such blocks will initially be automatically computed based on the document layout. Changing the position of a block within the reading order of an article can be accomplished via the keys “<” and “>”. The “<“ key moves a block to an earlier position within the reading order, while the “>” key moves a block towards the end of the reading order. The ordering of nearby blocks is then updated automatically to best reflect the changes performed by the user.
- black frame: An additional black frame around a block signals that the respective block belongs to the current user selection. A multi-block selection can be processed as a whole via the functions available in the context menu (see Multi-Block-Functions).
Context Menu for Single Block Selections
The context menu is displayed upon pressing the right mouse button. A different context menu is available depending on whether the current user selection consists of a single or multiple page blocks. The current section describes the functionality available for single block selections. The functions available in the context menu are:
§ Edit Article: This option is only available if the selected block has already been assigned to an article. Using this option has the effect of loading the entire article (i.e. all text blocks belonging to the article) into the editor, where it can be modified by the user.
§ Edit Block: The selected block is loaded into the text editor, where it can be inspected and modified. A simpler equivalent to this option is via a single left mouse click on a block.
§ Create Article: Creates a new article from the selected block. The created article is automatically loaded into the text editor. Note that this option is also available for blocks already assigned to articles. In this case the removal of the block from its containing article will happen automatically. The block will be marked directly as article title. The function “Create Article” may also be activated via the keyboard combination CTRL+a.
§ Exclude: Removes a block from an article. A simpler equivalent to using this context menu option is pressing the “Backspace” key when a single block is selected.
§ Undo Selection: Removes a block from a multi-block selection (see 3.3.3). This menu option is only available for blocks belonging to a multi-block selection.
§ Add to Selection: Only available for blocks not yet selected into the current multi-block selection. For such blocks, it has as effect the addition of the respective block to the multi-block selection.
§ Category: This menu option is only available for blocks already assigned to an article. It allows the labeling of the selected block with a category. The list of all available categories may be loaded via the application preferences.
§ Type: Set one of several pre-defined physical types for the selected block. The pre-defined page block types available are the following:
Text: Regular text block
Graphic: Line art (hand drawing)
Picture: Halftone image/ photograph
Table: any kind of table
§ Label: Only available for blocks marked as belonging to articles. One of the following logical labels may be chosen fort text blocks (all other block types will be automatically marked as attachment):
Roof title: small/underlined title usually placed above the main title of an article
Title: main title of an article
Subtitle: subtitle of an article
Byline: gives the name, the date, and often the position of the writer of the article. Bylines are traditionally placed between the headline and the text of the article, although some printed media place bylines at the bottom of the article
Lead: lead paragraph(s). It usually occurs together with the headline or title. It precedes the main body of the article, and it gives the reader the main idea of the story.
Body: body of an article, usually containing the bulk of its text
Intermediate Title: all intermediary titles following the subtitle and dividing longer articles into parts
Footnote: footnote assigned to an article
Depending on their label, blocks will be displayed and handled differently in the Article Editor pane (see section 3.1).
§ Delete Block: Completely discards existing blocks from the document page. The deleted block is afterwards no longer available for any kind of processing. A shortcut for this function is by pressing the SHIFT+Backspace key combination. Note that any text contained in the deleted block will still be remembered as being present at the respective location, albeit this is done “invisibly”. Consequently, in case a new block is created at the same position, the new text block will contain the portion of the text located within its bounds.
Context Menu for Multi-Block Selections
The Korrektor allows the processing of multiple blocks at once via multi-block selections and a corresponding context menu. Multiple blocks can be selected by pressing the left mouse button and dragging the mouse in a direction with the left mouse button still pressed. The blocks located mostly within the drawn rectangle will be marked das belonging to the current selection. Note that during the selection operation no key may be pressed. The selected blocks can afterward be modified as a whole via the following options from the context menu:
- Create Block: Only available immediately after the multi-block selection was performed. It allows the creation of a new block having one of the following types: Text, Picture, Graphic, Table. The result is similar to that which may be obtained via the extended keyboard combination ALT + mouse selection (see section 3.3.5). Note that Create Block does not technically belong to the category of multi-block functions. This is because the selected blocks are not modified in any way (i.e. they are not merged or incorporated into the new block).
- Single Block Menu: This menu item allows one to access the context menu options for single blocks, if so desired. The context menu for single blocks was presented in the previous section.
- Merge Blocks: All blocks belonging to the selection are merged into a single block. In the process, the old blocks are discarded. The new bounding box of the merged region is computed as the smallest box containing all selected blocks. An alternative shortcut for this option is the extended shortcut CTRL + ALT + mouse selection (see section 3.3.5).
- Create Article: Creates a new article from the selected blocks. This function may also be called via the keyboard shortcut CTRL+a.
- Exclude All: All selected blocks are removed from their containing article(s) and further remain as independent/ unassigned blocks. The equivalent shortcut for this context menu item is theBackspace key.
- Undo Selection: The block selection is reset (i.e. all selected blocks are removed from the selection).
- Category: Set the category field for the selected blocks. If all selected block already have the same category, the respective category will be shown as marked in the category list. If some of the selected blocks have different categories, no default category will be marked.
- Type: Set the block type for all selected blocks. If all selected blocks have the same type, the respective type will be shown as marked in the list of allowable types, otherwise no type will be marked by default. More information about the possible physical block types can be found in section 3.3.2.
- Label: Set the block logical label for all selected blocks. If all selected blocks already have the same logical label, it will be shown as marked in the label list, otherwise no label will be marked by default. More information about the possible logical labels may be found in section 3.3.2.
- Delete Blocks: Discard the selected blocks completely. This function may also be called via the keyboard shortcut SHIFT+Backspace. Another alternative is holding the SHIFT key pressed while performing a multi-block selection with the mouse so as to immediately discard all blocks in the selection rectangle (see also section 3.3.5).
Drag & Drop Functionality
In addition to the functionality offered in the context menus, a few common operations may also be performed comfortably via drag and drop. The following drag & drop-based operations are supported:
- Assigning a block to an/another article: Press and hold the left mouse button over an existing block. Drag the block (i.e. move mouse with left button still pressed) on top of another block which has already been assigned to an article and then release the mouse button. The initial block will then be assigned to the respective article. This operation works exactly the same for independent as well as previously assigned blocks. Alternatively it is possible to drag a block directly on top of the text editor area, case in which the block will be assigned to the article opened there.
- Marking a block as caption: If a text block is dragged and dropped (see above) on top of an attachment (e.g. photograph, line-art and table), it will automatically be labeled as caption for the respective attachment. This works for all blocks, regardless whether they were or were not already part of the article containing the non-text attachment.
- Removing a block from an article: By performing a drag & drop (see above) of a block assigned to an article on top of one which is unassigned, the former shall be marked as unassigned. Note that this functionality is equivalent to the Exclude function available in the context menu.
We have previously seen that it is possible to open a single text block for editing via the Edit Block item from the context menu. The same effect can be achieved via a simple left mouse click on a text block. In case the CTRL key is pressed during the mouse click, the whole article is loaded into the text editor.
Three modifier keys (CTRL, SHIFT and ALT) may be used in conjunction with the mouse as part of extended shortcuts so as to facilitate the execution of common operations. The possible shortcuts are described in the following.
CTRL: The CTRL key is generally used to signal that a certain operation does not merely refer to a single block, but to its entire containing article. The following extended shortcuts are currently supported:
- CTRL + left mouse button: The whole article (not only the clicked block) will be loaded into the text editor. The view will be automatically centered on the first block (in reading order) of the article.
- CTRL + Drag&Drop: The whole article containing the dragged block (not only the dragged block itself) will be merged into the destination article.
- CTRL + ALT + mouse (area) selection: If during the mouse selection of a rectangular area both keys CTRL and ALT are kept pressed, all blocks within the respective area will be merged into a new block. If all selected blocks belonged to the same article, then the new block will also belong to the respective article.
SHIFT: The SHIFT key is generally meant for actions related to the deletion or removal of single or multiple blocks. The following extended shortcuts are currently supported:
- SHIFT + mouse (area) selection: By keeping the SHIFT key pressed while performing an area selection operation with the mouse, all blocks located completely within the bounds of the selected rectangle will be deleted.
ALT: The ALT key facilitates functionality related to the creation of new blocks and articles. The following extended shortcuts are currently supported:
- ALT + mouse area selection: If the ALT key is kept pressed during the selection of a certain area with the mouse, then a new text block will be created from the respective rectangle. All words located within the marked area will be included in the new block, as long as they have not already been assigned to an existing block.