For instance, The Corpus Encoding Standard are rewriting their guidelines to
There are some tools supporting XML tagged texts (eg LTG in Edinburgh,
Alembic Workbench), and there was a discussion about the problems when
converting SGML into XML.
One positive thing was "to store the raw text without the annotations and
keep the annotations separatly in a compact format which refers back to the
raw text". This however implies that Xlink anc Xpointer is working, and that
has caused some problems with eg the parallel corpus in Oslo since ordinary
browsers do not support this.
The question is, has anyone really used Xlink and Xpointer? How are the
textheader and the texts connected otherwise? Is there any other way of
keeping the tag size minimally in the text?
VILT, Nordisk Språkteknologi AS
Postboks 93, 5701 Voss, Norway
+47 56 52 20 17