Re: Corpora: multilingual texts

Ted E. Dunning (
Tue, 2 Dec 1997 12:36:16 -0800

I did some work on language identification and have an evaluation
corpus available for anybody who wants to try their hand. This corpus
was developed by taking random samples from a Spanish/English parallel

I include with the test corpus both a technical report (somewhat
outdated) and working code (also somewhat outdated).

You can ftp the 1995 version of the test corpus/paper/code from

If you want the latest description and code, please email me.