Produced by: Instituut voor Nederlandse Lexicologie (INL)
In a mass digitisation workflow, this tool can be used to mark up named entities in the digitised text before indexing for retrieval. Entities are classified as PER (person), LOC (location) or ORG (organization).
The Named Entities Recognition Tool (NERT) is a tool that can mark and extract named entities (persons, locations and organisations) from a text file. The tool works by supervised learning, which means that it needs to receive a manually tagged subset of relevant material for training purposes before it can be applied to a corpus of text.
In addition, version 2.0 of the tool comes with a Named Entity matcher module, making it possible to group name variants and to assign modern word forms to old spelling variants (where, for instance, a town may have changed its name over time).
The tool is based on Stanford University’s Named Entity Recogniser, and has been extended for use in IMPACT. In addition to the matcher module, the other main extension is a module to reduce spelling variation within processed data, leading to improved discoverability.
For more information on the working of the Stanford tool, see Finkel, Grenager and Manning (2005) or visit the tool’s website.
- IMPACT deliverable D-EE2.6 Lexicon Cookbook (December 2011)
- IMPACT deliverable D-EE2.6 NERT User Manual (November 2011)
- IMPACT deliverable D-EE2.6 NE Work In IMPACT (December 2011)
- Landsbergen, F. Named Entity Work in IMPACT. IMPACT Final Conference 2011, 24-25 October, London, UK
The Stanford tool is licensed under the GNU GPL v2 or later.
- NERT at Impact Demonstrator Platform
- NERT at DigitWiki