English   Italiano

Getting started with browsing the PAISÀ corpus

This interface offers online access to the PAISÀ corpus, a collection of about 380 thousand Italian web texts. The corpus can be browsed to search for example sentences as well as for full text documents. Search results can be downloaded or inspected online. In particular, the analysis of syntactic dependency relations (such as subject or object relations) is supported by the interactive visualization interface Extended Linguistic Dependency Diagrams (xLDD).

The interface aims at providing an easy and powerful tool for making use of the corpus. It is designed to support users with different needs in terms of usability and querying-power. This is implemented by providing four different modes of access:

Index of help pages:

  1. Simple Search
    1. Selecting a (sub-)corpus
    2. Example queries
    3. Displaying results
    4. Exporting data
  2. Advanced Search
    1. Searching for inflected forms, lemmas and parts of speech
    2. Searching with regular expressions
    3. Searching for sequences of words
    4. Searching for dependency relations
    5. Displaying results
  3. CQP Search
    1. Examples in CQP syntax
    2. Adjusting display settings
    3. Storing results in named subcorpora
    4. Examples of complex queries
    5. Limitations on CQP search
  4. Filters
    1. Filter criteria
    2. Lists of text documents
    3. Named subcorpora
    4. Word Cloud
  5. General
    1. Export formats
    2. Using the dependency visualization
      1. Display options for dependency relations
      2. Dependency tagset
      3. Display options for words and lemmas
      4. Display options for parts of speech
      5. Part-of-speech tagset (or also here)
    3. Criteria for "readability" of sentences
      1. Advanced vocabulary
      2. Type-token ratio
      3. Indice Gulpease
    4. Corpus composition and annotation