In a mass digitisation workflow, this tool can be used to mark up named entities in the digitised text before indexing for retrieval. Entities are classified as PER (person), LOC (location) or ORG (organization).
The Named Entities Recognition Tool (NERT) is a tool that can mark and extract named entities (persons, locations and organisations) from a text file. The tool works by supervised learning, which means that it needs to receive a manually tagged subset of relevant material for training purposes before it can be applied to a corpus of text.
In addition, version 2.0 of the tool comes with a Named Entity matcher module, making it possible to group name variants and to assign modern word forms to old spelling variants (where, for instance, a town may have changed its name over time).
The tool is based on Stanford University’s Named Entity Recogniser, and has been extended for use in IMPACT. In addition to the matcher module, the other main extension is a module to reduce spelling variation within processed data, leading to improved discoverability.
For more information on the working of the Stanford tool, see Finkel, Grenager and Manning (2005) or visit the tool’s website.