Produced by: University of Warsaw
The ground truth material for Polish consists of books published from 1617 to 1756, the Digital Library of Polish and Poland-Related News Pamphlets from 1570 to 1728.
Prior to IMPACT, there were practically no historical corpora of Polish, which caused various problems from the very beginning. One of them was the lack of standards for representing old Polish texts in Unicode, as several necessary characters and ligatures are not provided, neither by the Unicode proper nor by Medieval Unicode Font Initiative.
The primary resource was the Internet dictionary we shall refer to as the “Late Middle Polish dictionary”, its official name being “The dictionary of the Polish language of the sixteenth and the first half of the seventeenth century”.
The current lexicon consists of 9,909 lemmata, 24,977 word forms and 26,736 lemma/word forms combinations.
Also, a set of more than 100 rules for historical spelling of Polish developed for the IMPACT project are now available
- IMPACT deliverable DEE3.13 Polish Lexicon Documentation (February 2012)
- Szafran, K. and M. Kresa. Glosa do leksykografii polskiej (New uses of historical dictionaries – in Polish), Glossa III lexicographic conference, 15″“16 September 2011, Warsaw, Poland
- Bień, Janusz S. (2014) The IMPACT project Polish Ground-Truth texts as a DjVu corpus. Cognitive Studies | Études Cognitives (14). pp. 75-84. ISSN 2080-7147
The Polish lexicon is freely available, but for distributing resources derived from the Late Middle Polish Dictionary the explicit permission of the Institute of Polish Language of Polish Academy of Sciences should be obtained. Also, distribution of resources derived from Morfeusz and SAM analyser should adhere respectively to their licenses.
For further information on licencing, please contact University of Warsaw IMPACT Group.
The rules for historical spelling of Polish developed for the IMPACT project are now available on the basis of GNU GPL license at https://bitbucket.org/jsbien/pol
Various scripts and data related to historical Polish are available from the IMPACT Centre of Competence github page.The lexicon is available under . The download of this resource is available for members of the Impact Centre of Competence and registered users. Please, log in to download it or discover here how to become a member . For further information, please contact us.