Corpora: [Q] Statistics on average sentence length ratios across

Oliver Christ (
Tue, 14 Oct 1997 19:54:17 +0200

Dear Colleagues,

I am looking for tables which list some kind of "standard ratio" between
average sentence lengths in words and characters across various language
pairs, e.g. "German : English - #words 1.2 #chars 1.16" or so (these
were randomly chosen figures ;-) ). I assume that these figures are text
type specific so that it would be a bit difficult to give accurate
figures for the "general" case, but at least some "average values"
should be fine as a starting point... The figures could easily be
computed e.g. from large, balanced (or, perhaps preferably, parallel)
corpora with marked sentence boundaries, but I don't have such corpora
at hand, and second, I need these figures for quite a bunch of language
pairs (as many as possible).

If you have pointers to any relevant information, please email me
directly; I'll post a summary here, of course.

Have a nice day,


PS: If there should be some garbage following the text of this mail: I
just didn't yet find a way to tell Outlook'97 not to include it ;-)