    Dear List Members,

    Last week I forwarded a message, asking for help (hints, commments,
    literature, etc.) on frequency occurrences (see original message below).

    Thanks a million to the people who answered:

    Tony Berber Sardinha
    Raphael Salkie
    Adam Kilgarriff
    William Mann
    Daniel Walker
    Linda Bawcom
    Jerome Richalot

    This is a summary of the comments, literature and websites suggested:


    William Mann:
    Remember that in the very early reports on the Brown Corpus (the grandfather
    of all), the word "jabberwocky" showed up with fairly high frequency.

    Daniel Walker:
    > What do frequencies exactly tell?
    Well, frequencies can give an idea of how likely some event is. A nice
    analogy is the linguistic notion of markedness. The more likely a
    linguistic phenomenon, the more marked it is and vice versa. More
    generally, statistics provide a well formed way to incorporate empirical
    evidence into linguistic studies.
    > And more interesting, what do they hide?
    > How misleading/erroneous can they be?
    > How far can we rely on them?
    It's hard to make inferences about infrequent events. This is both a good
    and a bad thing. For example, sentences which would fail a grammaticality
    judgement may be infrequent, providing empirical support for native
    intuition. On the other hand, most of language is infrequent (This is
    similar to Chomsky's notion of Poverty of Stimulus.) which means it can be
    very difficult to collect examples of interesting phenomena. Most texts
    have a bias towards some domain and can be misleading. For example, just
    because the bilingual proceedings of the Canadian parliament translate
    'House' as 'Chambre' 75% of the time doesn't necessarily indicate that
    'House' rarely means 'maison'. The limitations of statistics in linguistics
    varies according to what you're measuring and how you measure it. There are
    well formed technics for making cut-off and significance decisions, but
    there is also a need for experimentation and maybe even art.
    > What other features/aspects/measures should also be considered?
    > Are there ways/techniques to "correct" frequencies indices, statistically?
    > I would most appreciate ideas, comments and literature on this issue.
    There are many interesting and useful statistics that one can take from
    some body of text and many technics can be used to "correct" or smooth
    Linda Bawcom:
    John Sinclair (1991) Corpus, Concordance, Collocation, dice 'Any instance
    of language depends on its surrounding context. The details of choice shown
    in any segment of a text depend-some of them-on choices made elsewhere in
    the text, and so no example is ever complete unless it is a whole text'. (p.
    Y tambien Michael Hoey decia en la conferencia de TESOL Spain
    (1997?)-'Worldlists homogenize the heterogeneous'
    Por eso, para mi, la frequencia de una palabra es solo el premier paso-o
    sea, es interesante en si, pero no tiene tanto importancia (a menos que una
    esta haciendo un diccionario como COBUILD). Para mi, (como profesora) lo más
    importante es el contexto, como la palabra 'collicates' , 'colligates' o
    'co-occurs'. Es decir, si soy un aprendiz de un idoma, y encima perezosa (lo
    que soy!), y si mi profesor/a mi dice que dos palabras son sinonimos-yo voy
    a aprender solo una.
    Lo que si he visto es 1) (en cuanto a fier de un corpus) lo que vas a sacar
    de un corpus depende mucho del corpus-tiene que tenir mucho cuidado con la
    proposito de el . 2) no se puede clasificar 'whole sets' de palabras como se
    hacen en libros de texto para la aprendaje (e.g. maneras de mirar, maneras
    de tocar) sin dar un contexto.
    Un ejemplo-estoy mirando (por un presentación) la diferencia entre tal vez
    y quizá-lo que he visto es que quizá es seguido 8 veces mas por 'por eso' o
    para mas alguna razón' que tal vez-y tambien los dos tiene, en sus
    contextos, casi la mitdad de los instantes, un 'negation'-no sé porque.
    Ahora, como nativo tu, sin duda, ya lo sabia. Pero, yo estaba surprendida..

    Jerome Richalot:
