The Impact Centre of Competence provides historical and named-entities lexica for the following languages. In addition, we offer access to the different corpora:
The current lexicon consists of 28,857 lexical entries developed in LeXtractor.
The period covered by the Historical Lexicon of English is since 1497 until 1900.
The German historical corpus consists of 510 texts varying in length and including different genres.
The ground truth material for Polish consists of books published from 1617 to 1756.
The dataset contains material published from the second half of the eighteenth century to the end of the nineteenth century.
Fourteen works of Spanish Literature and a dictionary were selected for the IMPACT Demonstrator dataset.
It compiles over one hundred books. A complementary lexicon which links more than 10 thousand lemmas.
The reference corpus of historical Slovene goo300k contains the text from 1,100 pages sampled from the IMP collection with hand-validated linguistic annotation.
What is a lexicon?
A lexicon is a structured, machine-usable repository of relevant linguistic knowledge about words in a language. A lexicon will contain historical variants (orthographical variants, inflected forms) and link them to a corresponding dictionary form in modern spelling (known as a ‘modern lemma’). In this way, a user can search for a modern word (‘water’) and receive results that take into account all historical variants in that language (‘wæter’, ‘weter’, ‘waterr’, ‘watre’, etc.)
The download of these resources are available for members of the Impact Centre of Competence and registered users.