IMPACT-es diachronic corpus
Abstract
The impact-es diachronic corpus of historical Spanish compiles over one hundred books -containing approximately 8 million words- in addition to a complementary lexicon which links more than 10 thousand lemmas with attestations of the different variants found in the documents. This textual corpus and the accompanying lexicon have been released under an open license (Creative Commons by-nc-sa) in order to permit their intensive exploitation in linguistic research.
Approximately 7% of the words in the corpus (a selection aimed at enhancing the coverage of the most frequent word forms) have been annotated with their lemma, part of speech, and modern equivalent.
Publications
- Sánchez-Martínez, F., Martínez-Sempere, I., Ivars-Ribes, X., Carrasco, R.C.: An open diachronic corpus of historical Spanish published in Language Resources and Evaluation. Available at http://link.springer.com/article/10.1007%2Fs10579-013-9239-y
Availability
IMPACT-es corpus and lexicon by Universitat d’Alacant/Universidad de Alicante and Fundación Biblioteca Virtual Miguel de Cervantes is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License (see file LICENSE CC BY-NC-SA.txt).
IMPACT-es compact lexicon by Universitat d’Alacant/Universidad de Alicante and Fundación Biblioteca Virtual Miguel de Cervantes is licensed under a Creative Commons Attribution-ShareAlike 3.0 (see file LICENSE CC BY-SA.txt) and GPL3 (see file LICENSE GPL.txt) licenses.
IMPACT-es can be downloaded from the following links: