Named Entities Recognition Tool (NERT)

Produced by: Instituut voor Nederlandse Lexicologie (INL)

Scenario

In a mass digitisation workflow, this tool can be used to mark up named entities in the digitised text before indexing for retrieval. Entities are classified as PER (person), LOC (location) or ORG (organization).

Abstract

The Named Entities Recognition Tool (NERT) is a tool that can mark and extract named entities (persons, locations and organisations) from a text file. The tool works by supervised learning, which means that it needs to receive a manually tagged subset of relevant material for training purposes before it can be applied to a corpus of text.

In addition, version 2.0 of the tool comes with a Named Entity matcher module, making it possible to group name variants and to assign modern word forms to old spelling variants (where, for instance, a town may have changed its name over time).

The tool is based on Stanford University’s Named Entity Recogniser, and has been extended for use in IMPACT. In addition to the matcher module, the other main extension is a module to reduce spelling variation within processed data, leading to improved discoverability.

For more information on the working of the Stanford tool, see Finkel, Grenager and Manning (2005) or visit the tool’s website.

Publications


Availability

The Stanford tool is licensed under the GNU GPL v2 or later.

Further resources