From: Vladimir Rykov
Date: Mon Feb 07 2000

         It was very interesting for me to read the "What is a corpus"
         Really a problem exists - what is a corpus, is it balanced or/ and
         If we would take as an example a case of corpus of proverbs - who
    can say that this is a corpus and not archive or set or dump of
    proverbs? We can find many interesting things at a dump storage - but
    what is the value of our findings? If we did not any pre-processing
    (filtering) during creation of our set of proverbs - then what is the
    value of the following statement: "There are no Italian proverbs about
    unlucky marriages" ?
         This statement is reliable or scientific only for representative
    proverb corpus. Otherwise - "dump as input - dump as output (dust to
    dust)". Is there a quasi-logical procedure of defining - is this
    collection (dump) of textual data a representative corpus? This is the
    starting point of all the following activity - is it scientific one or
    paid hobby?

