Apologies if this question either betrays a fundamental misunderstanding on
my part or is old hat.
If one is employing the log likelihood ratio (or similarly Chi-Square) to
establish a significant difference in use of a certain word in two corpora,
as far as I understand, it is calculated using a contingency table based on
the Frequency of the word/ Frequency of other words/ Total number of words
However, how is this employed if we want to establish a significant
difference in use of a multi word unit (such as a 2 word prepositional
phrase) in two corpora? Frequency of multi-word unit is easy enough, but
what does "Frequency of other words" become? Indeed can the log likelihood
ratio be used in this case? If not what alternatives are there?
Thanks for any comments in advance
Hong Kong Polytechnic University
This archive was generated by hypermail 2b29 : Thu Dec 20 2001 - 02:22:33 MET