The Filter interface allows full text corpus documents to be retrieved according to a set of criteria specified by the user. The documents can be inspected one by one, or downloaded as .zipped .txt files.

The filter interface also allows for the creation of named subcorpora composed of filtered texts. After their creation, subcorpora show up in the corpus drop-down menu and can be queried like the full PAISÀ corpus.

Filter criteria

Filtered results are provided in three different manners:

Lists of text documents

The list of texts satisfying the filtering criteria can be paged through by clicking on arrow icons (see screenshot below); single texts can be opened in a separate tab by clicking on the file name or icon.

example list of texts

Named subcorpora

A named subcorpus containing all corpus texts that satisfy the filtering criteria can be stored by entering a name for the subcorpus in the appropriate field (see screenshot below) before clicking "submit". The name for the subcorpus has to start with a capital letter and can be composed of letters, numbers and underscore.

example named subcorpus

User-defined subcorpora show up in the corpus dropdown menu and can be used for subsequent querying. The subcorpus called "Last" always stores the results of the most recent query or filtering carried out by the user.

Word Cloud

A word cloud is built based on the word frequencies of 80 of the documents that satisfy the filtering criteria. Words are displayed in alphabetic order and are scaled according to their frequencies.

The screenshot below shows a Word Cloud for documents filtered by the keyword "ferie".

example word cloud

The word cloud is implemented based on Google Visualization API.

