It strikes me as ironic that corpus linguists would want to prescribe
the usage of the word "corpus". Using Oliver's terminology, I would say
that all corpora are `filtered'. choosing 13th century texts, or
Shakespeare's plays, or conversations with a travel agent, or the Bible,
etc, etc., all are ways of filtering the abstract body of language
around us for a specific purpose, since they all involve a criterion of
what is in and what is out of the corpus.
So, if Francois' purpose is to study proverbs, he could just as well do
it using a corpus-based methodology (i'm not saying anything about
whether that is appropriate or not -- it all depends on what his actual
goals are). And if someone else wants to study the intra-sentential
behavior of past tense verbs, they might just as well collect a corpus
of past tense sentences. Btw, recently i have also heard of corpora of
images, which goes even farther away from the original "collection of
texts" definition brought up by Paul Hays.
I would agree with Oliver when he says:
My understanding of `corpus' is that it is some more or less
homogeneous collection of utterances, but not `filtered'
if "homogenous" meant that there is a criterion of selecting what is in
and what is out; and (in order not to make the above 'definition'
contradictory) "not `filtered'" meant that no further restriction should
be imposed on the data, beyond the mentioned selection criterion (as
Paul mentioned, this is sometimes hard to achieve).
Have a beautiful day!
This archive was generated by hypermail 2b29 : Fri Jan 28 2000 - 01:48:22 MET