Re: Corpora: Zipf law for different languages?

From: E S Atwell (
Date: Tue Nov 14 2000 - 14:50:09 MET

  • Next message: Bill Fisher: "Re: Corpora: Phonemic Corpora"

    You ask an interesting question! We have some empirical results showing
    Zipfian distribution recurrs across a range of natural languages, and
    hence could be used as a characteristic feature to look for when seeking
    "language" in unknown signals, see

    Elliott J, Atwell, E and Whyte B. 2000. Language identification in unknown
    signals. in Proceeding of COLING'2000, 18th International Conference on
    Computational Linguistics, pages 1021-1026, Association for Computational
    Linguistics (ACL) and Morgan Kaufmann Publishers, San Francisco.
    ISBN: 1-55860-717-X (2 volumes).

    Elliott J, Atwell, E and Whyte B. 2000. Increasing our ignorance of
    language: identifying language structure in an unknown signal. in
    Daelemans W (ed) Proceedings of CoNLL-2000: International Conference on
    Computational Natural Language Learning, Lisbon, Portugal.

    Elliott J and Atwell E. 1999. Language in signals: the detection of
    generic species-independent intelligent language features in symbolic and
    oral communications. in Proceedings of the 50th International
    Astronautical Congress, paper IAA-99-IAA.9.1.08, Amsterdam. International
    Astronautical Federation, Paris.

    Elliott J and Atwell E. 2000. Is anybody out there?: the detection of
    intelligent and generic language-like features. In Journal of the British
    Interplanetary Society, volume 53 no.1/2 pages 13-22, British
    Interplanetary Society, London. ISSN: 0007-084X.
    (see my www homepage for preprints of these papers)

    However, we did not try to measure the variation WITHIN the set of natural
    languages. If you get any replies to your search, please copy these to us
    as we would like to know too!

    Good luck,

    Eric Atwell

    Eric Atwell, Distributed Multimedia Systems MSc Tutor & SOCRATES Tutor
    School of Computing, University of Leeds, LEEDS LS2 9JT
    TEL: (44)113-2335430  FAX: (44)113-2335468
    WWW:  EMAIL:

    On Mon, 13 Nov 2000, Alexander Gelbukh wrote:

    > Dear colleagues, > > Where can I find something about the differences in Zipf law for different > languages or genres? Say, different exponent etc. > > Thank you! > Alexander > > ===================================== > Prof. Dr. Alexander Gelbukh (Alexandre Guelboukh Kahn), > Professor and researcher, head of NLP Lab. > Lab. de Lenguaje Natural, Centro de Investigacion en Computacion, > IPN, Av. Juan Dios Batiz s/n esq. Mendizabal, UP Adolfo L. Mateos, > Col. Zacatenco CP 07738, Mexico DF., Mexico > Office: (+52) 5729-6000 ext. 56544, 56518, 56602. > Fax and Voice (answering machine): +1 (520) 441-1817 (personal). > Shared fax: (+52) 5586-2936. Home: (+52) 5597-0709. >,, > ===================================== > > > >

    This archive was generated by hypermail 2b29 : Tue Nov 14 2000 - 15:04:16 MET