Output

The report generated by the evaluation tool contains:

  • The estimated CER (character error rate) and WER (word error rate) for the sample (in several flavors).
  • The best alignment between the ground-truth text and the OCR text.
  • Details statistics on the number of mistakes for every character.

For example, this sample output shows a relatively low character error rate (9.25%) which leads to almost one out of four words wrong (23.51%). The alignments are shown in two parallel columns (the ground-truth appears on the left-hand side and the OCR text on the right-hand side) where:

  • Replaced content (substitutions) are shown in red foreground, and the replacing text is shown when the mouse-pointer moves over the highlighted text (the associated segment in the parallel text will be highlighted).
  • Spurious content (insertions) is marked with an aquamarine background in the OCR text.
  • Lost content (deletions) is marked with an aquamarine background in the ground truth text.

Finally, a table with all characters in lexicographic order displays the absolute and relative number of mistakes for each one. The hexadecimal code is listed to facilitate the identification of those characters which are non-printable (such as blanks) or not properly displayed by the browser.