Partnership of Paisà

The project is a joint effort of:

Project Lead

Responsibilities are divided among the partners as follows:

[corpus creation]
The corpus collection is done by the University of Trento. Copyright-free text materials are bootstrapped from the web. The harvested texts are automatically cleaned by stripping of html tags and other formatting and navigation data (for more information see construction steps).
[corpus annotation]
The linguistic annotation of the corpus is done by combining manual and automatic annotation procedures. Manually annotated data is used to refine the computational linguistic methods and tools used for corpus annotation (for more information see construction steps). The manual annotation of corpus texts and the evaluation of analysis tools is done by researchers of the University of Bologna, the University of Trento, and CNR Pisa. Tools are developed, adjusted and applied by the CNR Pisa.
[corpus interface]
The corpus is made available to the public via a free online interface. The creation of a multi-facetted user interface for language learners and researchers is accomplished by the European Academy of Bozen/Bolzano.