we are interested in finding out about the average text length difference
between texts and their translations (parallel texts). We would be
interested in data for all eleven official European Union languages, but
especially for the language pair English - Spanish. We want to use this (and
further) information to automatically identify translations of a given text
in a larger text collection.
Text length differences could be expressed either by using the number of
words or the number of characters. In our own sublanguage corpus, Spanish
texts use about 13% more characters than their English equivalences, but we
would like to have information pertaining to texts other than our own.
Thanks in advance for any help with this. I shall send a summary of the
responses to the list.
Joint Research Centre - Ispra site (http://www.jrc.it/langtech/)
This archive was generated by hypermail 2b29 : Mon Oct 01 2001 - 16:25:59 MET DST