One possible source of terminological confusion (though I'm not sure
if this is what Ted meant) is the fact that "mutual information" in
information theory relates two random variables X and Y (e.g., see
Cover, T. and J. Thomas, Elements of Information Theory, Wiley, 1991):
I(X,Y) = Sum_{x,y} Pr(x,y) log [ Pr(x,y) / Pr(x)Pr(y) ]
This is equal to the expected value of the "misleadingly" termed
quantity:
I(x,y) = log [ Pr(x,y) / Pr(x)Pr(y) ]
One solution to this confusion that I've seen is to refer to the
former quantity as "average mutual information" and the latter as
"pointwise mutual information", which seems as good a way to go as
any, since completely renaming either quantity is a practical
impossibility.
Incidentally, the modification of their MI-like association ratio,
frequency(words together)
frequency(words together) * ---------------------------------,
frequency(wordA)*frequency(wordB)
seems related in spirit to a measure of association that I proposed,
which I have called "selectional association" (because it was
developed in the context of a model of selectional preferences,
e.g. of verbs for their arguments). It can be written:
Prob(x and y together)
A(x,y) = (1/Norm) Prob(x|y) log ---------------------------
Prob(x alone)*Prob(y alone)
where Norm(alization) is the sum of A(x,y) over all x. Like Andrew, I
multiplied the association ratio I was using (in my case, pointwise
mutual information) by an additional application of frequency (here,
the conditional probability of x given y). This had better behavior
than pointwise mutual information for similar reasons (avoiding
problems associated with low-frequency values of x, given y; note the
asymmetry). [P. Resnik, (1996) "Selectional constraints: an
information-theoretic model and its computational realization",
Cognition 61, pp. 127-159.]
Philip
----------------------------------------------------------------
Philip Resnik, Assistant Professor
Department of Linguistics and Institute for Advanced Computer Studies
1401 Marie Mount Hall UMIACS phone: (301) 405-6760
University of Maryland Linguistics phone: (301) 405-8903
College Park, MD 20742 USA Fax : (301) 405-7104
http://umiacs.umd.edu/~resnik E-mail: resnik@umiacs.umd.edu