Dear list members
thanks to all who replied to my query about sentence aligner scripts:
Susan Armstrong, Torgny Rasmark, Jean Veronis, François Maniez, Tomaz Erjavec,
Below are the replies that I got:
We have a publicly available aligner made for an EU project some years
ago - available at - http://www.issco.unige.ch/tools/
vanilla aligner (for DOS) :
Gale, W., and Church, K. (1993) "A Program for Aligning Sentences in
Bilingual Corpora," Computational Linguistics, 19:1, pp. 75-102.
There is a C program published at the end of the paper. It is available
from Ken's page at:
this is not about perl or Unix, but I have written a Word macro that does
the trick if the original format of your data is an x-column table where x
is the number of languages included in your parallel corpus (I am currently
building a medical corpus from files available on the European Commission
website in English, French, German, Italian, Spanish and Portuguese, in
order to test terminological extraction algorithms).
The output of the macro needs to be manually corrected, as one sentence will
occasionally be translated in two sentences and vice-versa.
Let me know if you're interested, and I'll send it along.
Maître de Conférences
Centre de Recherche en Terminologie et en Traduction
Département de Langues Étrangères Appliquées
Université Lumière Lyon 2
Vanilla can also be found at
complete with an accompanying paper and free to download!
There is a version of the Vanilla aligner, pre-compiled for DOS, on the
It is possible to download a compressed archive from there, but, as I
don't understand Swedish (assuming it is Swedish...), I don't know if
there are any restricions on its use.
Also, if you go to Kenneth Curch's publications page, you can download the
text version of
Gale, W., and Church, K. (1993) ³A Program for Aligning
Sentences in Bilingual Corpora,² Computational Linguistics, 19:1, pp.
which contains the source code for their famous aligner as an appendix.
Dr Tony Berber Sardinha
(Catholic University of Sao Paulo, Brazil)
This archive was generated by hypermail 2b29 : Tue Dec 17 2002 - 17:14:38 MET