Language Resources

The Impact Centre of Competence provides historical and named-entities lexica for the following languages. In addition, we offer access to the different corpora.


A lexicon is a structured, machine-usable repository of relevant linguistic knowledge about words in a language. A lexicon will contain historical variants (orthographical variants, inflected forms) and link them to a corresponding dictionary form in modern spelling (known as a ‘modern lemma’).

Historical Lexica


Bulgarian Lexicon

The current lexicon consists of 28,857 lexical entries.


Czech Lexicon

The period covered by the Historical Lexicon of Czech is 1800 – 1900.


Dutch Lexicon

The period covered by the Historical Lexicon of Dutch is 1600 – 1940.


English Lexicon

The period covered by the Historical Lexicon of English is 1497 – 1900.


French Lexicon

The Historical Lexicon of French is focused on the 17th century.


German Lexicon

The German lexicon consists of 510 texts including different genres.


Polish Lexicon

The ground truth material for Polish covers from 1617 to 1756.


Slovene Lexicon

The dataset covers the second 18th and 19th centuries.


Spanish Lexicon

Composed by 14 works of Spanish Literature and a dictionary.


Latin Lexicon

Latin Lexicon was produced by the University of Alicante


IMPACT-es diachronic corpus

IMPACT-es Diachronic Corpus

IMPACT-es diachronic corpus of historical Spanish compiles over one hundred books. A complementary lexicon which links more than 10 thousand lemmas.

Slovene Corpora

IMP Slovene Corpora

The reference corpus of historical Slovene goo300k contains the text from 1,100 pages sampled from the IMP collection with hand-validated linguistic annotation.

Corpora search services

Diasearch – Diachronic corpus search

Diasearch is an online service which enables users to perform linguistically enriched queries on a collection of historical texts. It is currently available for the Spanish IMPACT-es corpus.

Golden Age Sonnet Search Service

The Golden Age sonnet search service allows the exploitation of a
TEI-based Spanish poetry corpus with 5078 sonnets written in 16th and 17th centuries.