What is a “corpus”?

abril 22, 2008 en 10:52 am | Publicado en Language Resources | Deja un comentario

According to An Encyclopedic Dictionary of Language and Languages (Crystal, David. 1992.  Oxford, 85) a corpus is a collection of linguistic data, either compiled as written texts or as a transcription of recorded speech. The main purpose of a corpus is to verify a hypothesis about language – for example, to determine how the usage of a particular sound, word, or syntactic construction varies.

Corpus linguistics deals with the principles and practice of using corpora in language study. A computer corpus is a large body of machine-readable texts.

In the EAGLES recommendations on corpus typology (EAGLES, 1996e), a corpus is defined as:

Corpus: A collection of pieces of language that are selected and ordered according to explicit linguistic criteria in order to be used as a sample of the language.

Words such as `collection’ and `archive’ refer to sets of texts that do not need to be selected, or do not need to be ordered, or the selection and/or ordering do not need to be on linguistic criteria, They are therefore quite unlike corpora.

Linguistic criteria to be applied to the selection and ordering may be:

External
— in that they concern the participants, the occasion, the social setting or the communicative function of the pieces of language;

Internal
— in that they concern the recurrence of language patterns within the pieces of language.

These criteria are reviewed in more detail in the recommendations on corpus typology (EAGLES, 1996e) where a classification of different types of corpora can also be found.

Since this document is devoted to computer corpora, it is appropriate to start by the definition also proposed in the above document:

Computer corpus: a corpus which is encoded in a standardised and homogeneous way for open-ended retrieval tasks.

Source: ILC

Anuncios

Dejar un comentario »

RSS feed for comments on this post. TrackBack URI

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s

Crea un blog o un sitio web gratuitos con WordPress.com.
Entries y comentarios feeds.

A %d blogueros les gusta esto: