IMPACT Polish GT Corpora

Produced by: University of Warsaw

Abstract


The search engine, made available by the Formal Linguistics Department of the University of Warsaw, facilitates searching digitalized texts in the DjVu format. The engine is a modification of the Poliqarp system (developed in the Institute of Computer Science of Polish Academy of Sciences) used to support the National Corpus of Polish, so it has the same query syntax. The modification has been implemented by Jakub Wilk, who also converted most of the texts to a suitable format. The idea to use Poliqarp for DjVu texts was developed by Janusz S. Bień. It was presented in a paper entitled “Facilitating access to digitalized dictionaries” and later in other publications including “Efficient search in hidden text of large DjVu documents”.

Publications

Language resources

The Impact Centre of Competence provide historical and named entities lexica for the following languages. In addition, we offer access to the different corpora.