Centrum für Informations- und Sprachverarbeitung – Ludwig-Maximilians Universität

The Center for Information and Language Processing (Centrum für Informations- und Sprachverarbeitung – CIS) is the computational linguistics institute of the University of Munich (LMU). CIS conducts interdisciplinary research on natural language processing (NLP) and its theoretical foundations. Just some of the NLP problems we are working on are computational syntax and semantics, sentiment analysis, machine translation and semi-supervised learning, adaptation and extension of lexical resources.

The applications CIS research has traditionally focussed on are information extraction (IE), information retrieval (IR) and NLP resources needed for IE/IR, as well as library technology. Our workon IR includes methods for approximative search and the development of search engines that can exploit structured NLP analysis of documents. In our spin-off TopicZoom (http://www.topiczoom.de/) we develop techniques for automatically retrieving topics occurring in documents and archives, and to use this as a basis for semantic search. More recently we have also started focussing on applications in the humanities. We collaborate with scholars of language (the crowdsourcing platform Play4Science) and philosophers (work on an electronic Wittgenstein edition).

In the field of OCR, digitization, IR and library science, our main achievements are:

  • the creation of an electronic lexicon for historical German. This lexicon helps to improve OCR and IR
  • we built an advanced system for interactive postcorrection of OCR-results (open source)
  • we run a web service for “profiling” historical OCRed documents in terms of prominent OCR errors and the kind of historical language variation found in the texts. Profiles may be added to the postcorrection system, which helps to considerably reduce the time needed to correct texts.

CIS is also providing the following tools and data:

