Re: Corpora: Negative mutual information?

From: Philip Resnik (
Date: Thu Mar 08 2001 - 19:55:34 MET

  • Next message: larry moss: "Corpora: FG/MOL Second CFP"

    > I have a question about calculating mutual information for bigrams
    > in text. According to every definition I've seen of MI, the values
    > are non-negative. However, I've found that for some bigrams made
    > of common words in very uncommon bigrams, the value is less than
    > zero. Does anyone know how to interpret a negative mutual
    > information?

    Where have you seen a definition suggesting (pointwise) MI must be
    non-negative? The definition is based on a comparision between the
    observed co-occurrence probability for the two words (i.e. the joint
    probability P(x,y)), compared with the co-occurrence probability one
    would expect to see if the two words were independent (i.e. the
    product of the marginal probabilities P(x) and P(y)); namely

      I(x,y) = log [ P(x,y) / P(x)P(y) ]

    If the two words occur together *exactly* as frequently as one would
    expect by chance, the ratio inside the log is equal to 1, giving us
    I(x,y) = 0; if they occur more frequently than one would expect by
    chance, the ratio is greater than 1 so I(x,y) > 0; and conversely if
    they occur less frequently than one would expect by chance, the ratio
    is less than 1 so I(x,y) < 0.

    Nothing in principle or in practice prevents this last case, and the
    interpretation is that the two words are for some reason dissociated
    rather than associated, e.g. for linguistic reasons. For example,
    "he" and "write" are probably both quite frequent unigrams, but the
    bigram "he write" is highly unlikely because it violates number
    agreement between the subject and the object. Hence one would predict
    I(he,write) < 0.

    That said, note that the *average* mutual information between two
    random variables X and Y is defined as the relative entropy
    D( P(x,y) || P(x)P(y) ) between the joint and the independence
    distributions. Like any relative entropy, that value is indeed
    guaranteed to be non-negative; e.g. see Cover, T. M. and Thomas,
    J. A. (1991), Elements of Information Theory, Wiley, New York. The
    term "mutual information" is sometimes used to refer to the
    information-theoretic quantity of average mutual information, and
    sometimes used to refer to pointwise mutual information, which is a
    potential source of confusion.


      Philip Resnik, Assistant Professor
      Department of Linguistics and Institute for Advanced Computer Studies

      1401 Marie Mount Hall UMIACS phone: (301) 405-6760
      University of Maryland Linguistics phone: (301) 405-8903
      College Park, MD 20742 USA Fax : (301) 405-7104 E-mail:

    This archive was generated by hypermail 2b29 : Fri Mar 09 2001 - 01:18:16 MET