> Is there anyone who has experience in annotating corpora using XML?
You should look at the Corpus Encoding Standard (CES) at
The specification is currently in SGML but we are in the process of
changing it to XML. It provides encoding specifications for logical
structure (chapter, paragraph, etc.), sub-paragraph elements (sentences,
as well as names, dates, etc.), and extensive specs for part of speech
and alignment annotation.
The move to XML may well affect the CES considerably, given the
potential uses of XSL transformations to manipulate this kind of data.
However, in its current form the CES is very usable for corpus
annotation of a variety of kinds.
Professor and Chair
Department of Computer Science, Vassar College
Poughkeepsie, NY 12604-0520 USA
Tel: +1 914 437-5988 Fax: +1 914 437-7498
Equipe Langue et Dialogue, LORIA/CNRS
Campus Scientifique - BP 239
54506 Vandoeuvre-lès-Nancy FRANCE
Tél: +33 (0)3 83 59 20 47 Fax: +33 (0)3 83 41 30 79