InformationRetrievalLab Relevant search results



Coruña corpus tool

The Coruña Corpus Tool (CCT) is a development carried out by the IRLab in collaboration with the English Department of the University of A Coruña. Indeed the application came up because the need of the Muste Group of having a system to manage and exploit its linguistic corpus.
The objective is to help linguists to extract and condense valuable information for their research. But the application was not designed tied to the Coruña Corpus and it supports any xml-formatted corpus being, in this sense, an application that could be widely used.
As commercial product the CCT offers:

  • Linguistic corpus management, not only documents as text but also author information and styled document rendering.
  • Treatment and validation of TEI encoded documents with support for non-standard characters. It supplies information about the format errors in order to allow the correction by the linguists.
  • Intra-documental and collection basic search by single terms.
  • Concordance generation (key-word in context) of all the term appearances and location in the document.
  • Prefix, suffix and regular expressions search, which is very useful for the linguistic work.
  • Phrase search with term distance specification in order to search for linguistic structures.
  • Generation of types and tokens lists in document and collection level to allow statistical study of the terms occurrences.

Download brochure