Re: Corpora: Zipf law for different languages?

From: E S Atwell (
Date: Tue Nov 14 2000 - 14:50:09 MET

    You ask an interesting question! We have some empirical results showing
    Zipfian distribution recurrs across a range of natural languages, and
    hence could be used as a characteristic feature to look for when seeking
    "language" in unknown signals, see

    Elliott J, Atwell, E and Whyte B. 2000. Language identification in unknown
    signals. in Proceeding of COLING'2000, 18th International Conference on
    Computational Linguistics, pages 1021-1026, Association for Computational
    Linguistics (ACL) and Morgan Kaufmann Publishers, San Francisco.
    ISBN: 1-55860-717-X (2 volumes).

    Elliott J, Atwell, E and Whyte B. 2000. Increasing our ignorance of
    language: identifying language structure in an unknown signal. in
    Daelemans W (ed) Proceedings of CoNLL-2000: International Conference on
    Computational Natural Language Learning, Lisbon, Portugal.

    Elliott J and Atwell E. 1999. Language in signals: the detection of
    generic species-independent intelligent language features in symbolic and
    oral communications. in Proceedings of the 50th International
    Astronautical Congress, paper IAA-99-IAA.9.1.08, Amsterdam. International
    Astronautical Federation, Paris.

    Elliott J and Atwell E. 2000. Is anybody out there?: the detection of
    intelligent and generic language-like features. In Journal of the British
    Interplanetary Society, volume 53 no.1/2 pages 13-22, British
    Interplanetary Society, London. ISSN: 0007-084X.
    (see my www homepage for preprints of these papers)

    However, we did not try to measure the variation WITHIN the set of natural
    languages. If you get any replies to your search, please copy these to us
    as we would like to know too!

    Good luck,

    Eric Atwell

    On Mon, 13 Nov 2000, Alexander Gelbukh wrote:

    > Dear colleagues, > > Where can I find something about the differences in Zipf law for different > languages or genres? Say, different exponent etc. > > Thank you! > Alexander

