There's a hint in an old SPSS manual that for very large samples, even
miniscule deviations from the mean will generate statistically
significant differences. This is because of the power of the chi-square
goodness-of-fit which is difficult to determine analytically. The point
is made in:
George C. Canavos (1984), Applied Probability and Statistical Methods,
Little Brown & Co.
where you can also find:
"However, it can be shown that for extremely large sample sizes, it is
almost certain to reject the null hypothesis because one would not be
able to specify H0 close enough to the true distribution. Thus the
application of chi-square is questionable when extremely large sample
sizes are involved."
The question is how large can be "extremely large" in computational
linguistics. A text corpus of a few million words can sometimes be too
small to describe a variety of phenomena in natural language yet the
same corpus (as a sample) may be too large for the application of
chi-square (other tests may have a similar problem too).
As a remedy, several statistics books propose the (not widely used) phi
coefficient which compensates for the sample size:
phi=square_root(chi-square/N) (N=sample size)
Phi takes the value of 0 when no relationship between the dependent and
independent variables exist and 1 when when the variables are perfectly
related. Phi is not a test however; it is just an association
coefficient so that it is up to the experimenter to decide what a
threshold value for a true relatioship between the variables can be
appropriate.
Best,
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
George Demetriou
Dept. of Computer Science Room: 219
The University of Sheffield Tel: +44 (0) 114 22 21894
Regent Court FAX: +44 (0) 114 278 1810
211 Portobello Street e-mail: demetri@dcs.shef.ac.uk
Sheffield, S1 4DP, UK
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%