I'm guessing that what you are looking at are pointwise
Mutual Information values, usually defined along these
lines for a bigram 'word1 word2':
where N is the number of bigrams in your sample.
This will go negative when
freq(word1,word2)*N < freq(word1)*freq(word2)
N < freq(word1)*freq(word2)/freq(word1,word2)
So what does a negative value tell us? Well, it suggests
that word1 and/or word2 must be very high frequency words
(the, and, a ... come to mind) that don't occur together
in the bigram under consideration especially often.
You can also look at the relationship
freq(word1,word2) < freq(word1)*freq(word2)/N
The right hand side of this inequality is the expected value
for the frequency count of the bigram 'word1 word2' under
the classical assumption of independence (which underlies
tests like Pearson's and the loglikelihood ratio). So a
negative pointwise mutual information value tells us that
observed frequency count for a bigram is less than we would
expect under the assumption that the words in the bigram
I have puzzled a bit over this notion of being 'less than
what would be expected under independence'. Does this just
mean that the words in the bigram are independent, or is
something further suggested? I'd be interested if anyone else
has some thoughts on that particular issue...
Anyways, I'm not sure how good a tool pointwise Mutual Information
is anyway (see the Manning and Schutze text, for example, for
some reasons for concern) but it does raise some interesting
issues no doubt.
--- Ted Pedersen http://www.d.umn.edu/~tpederse _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com
This archive was generated by hypermail 2b29 : Fri Mar 09 2001 - 01:16:40 MET