Corpora: tagger summary

Jean Hudson (
Tue, 25 Nov 1997 14:01:14 +0000

Thanks to all who supplied information on the evaluation of taggers.
Here is a summary of the replies, and some comments, from Andrew Harley:

Last year, we carried out a test on 4 taggers: the Prospero "Parser"
(telephone Mike Oakes on 0181-741-8531 for details), one from John Carroll
at Brighton University, an old ACQUILEX tagger written by David Elworthy at
Cambridge University, and our internal sense tagger. No ambiguous or
unknown tags were permitted, punctuation tags were certainly not counted
(unlike some other scores given in the literature!), and we had strict
rules about coding participles as attributive adjectives if that was the
function they were performing in the sentence. This is rather unfair on the
taggers but reflected the results that we wanted for our corpora. The
accuracy rates on a 4000 word sample were low, ranging from 87% to 90% (for
approximately 50 tags), Prospero coming out top.

Jochen Leidner <> considered this a
serious issue, and provided lots of helpful information. Being unaware of
systematic studies in the field, he himself set out on undertaking just
such an analysis. The technical report that contains the tagged data is
available from
and the data files from

Philip Bralich <> agreed there were very few studies,
suggesting only the MUC conferences at

Eric Atwell <> has *nearly* finished a paper comparing
accuracy rates etc, for submission to Computer Speech and Language (special
issue on evaluation). His gut feeling is that there's little difference in
accuracy, most work about 90-95% depending on tagset, language genre, and
application-dependent factors. He recommends not his tagger but (i) the
English Constraint Grammar tagger/semiparser at Helsinki, which in addition
to PoS categories marks subject, object, and some dependency relations; and
(ii) Alex Fang's AUTASYS tagger and ICE parser, which adds PoS tags and
full parse-trees according to ICE markup scheme. However, this isn't
really based on "official tests", just personal assessments...

Klas Prytz <> has done some evaluation of the ENGlish
Constraint Grammar (ENGCG) and the recall seems quite high but precision is
much lower. No official paper yet.

Djoerd Hiemstra <> reported that Martin Rajman
<> of EPFL (Swiss Federal Institute of Technology in
Lausanne, Switzerland) is working on a large scale comparison of taggers
and parser for POS-tagging, which he thinks is to be published next January.

Leidner also comments that the topic of evaluating the accuracy of taggers
and parsers is very difficult, because there is a lot of diversity wrt
tagset size (some tagsets are rather crude, others include
subcategorization information or even semantic subclasses), so n%
correctness using tagset A is perhaps still worse than (n-1)% correctness
using a more detailed tagset B. The AMALGAM project at ULeeds is concerned
with mapping different annotation models

The question of speed is usually not properly addressed in the literature
because in most cases no detailed information about the hardware is given
(specINT95, memory size, user mode, ...). Dimitrios Kokkinakis
<> reported that Cooke's semanTag on Swedish is 9 times
faster than the Brill tagger.

SOME WEB POINTERS (Jochen Leidner)
At you can test
EngCG-2, IMHO a high-quality, rule-based parser (by Lingsoft).
The BRILL-TAGGER is available via FTP at
The XEROX-TAGGER is available via anonymous FTP at
For morphological analysis, you can download either PC-KIMMO 2
from or Malaga from (both without
ling. descriptions).
For info on the "AD ENGLISH LEMMATIZER" contact Bruno Maximilian
Schulze (IMS Stuttgart) <>
The ENGTWOL Tagger and lemmatizer can be also bought from Lingsoft,

Tapanainen, Pasi and Atro Voutilainen, "Tagging accurately - Don't
guess if you know" In the proceedings of the Fourth Conference on
Applied Natural Language Processing (ANLP'94). pp.47-52.
Stuttgart, Germany, 1994.
Samuelsson, Christer and Atro Voutilainen, "Comparing a Linguistic and
a Stochastic Tagger." In Proceedings of the 35th Annual Meeting of
the Association for Computational Linguistics, pp. 246-253, ACL,
1997. [Also available online as cmp-lg/9706005.]
E. Black, S. Abney, D. Flickenger, C. Gdaniec, R. Grishman,
P. Harrison, D. Hindle, R. Ingria, F. Jelinek, J. Klavans,
M. Liberman, M. Marcus, S. Roukos, B. Santorini, and T.
Strzalkowski. A procedure for quantitatively comparing the syntactic
coverage of English grammars. In Defense Advanced Research Projects
Agency: Proceedings of the Fourth DARPA Speech
and Natural Language Workshop, Pacific Grove, California, February 1991.
Morgan Kaufmann.
P. Harrison, S. Abney, E. Black, D. Flickenger, C. Gdaniec,
R. Grishman, D. Hindle, R. Ingria, M. Marcus, B. Santorini, and
T. Strzalkowski. Evaluating syntax performance of parser/grammars of
English. In Proceedings of the Workshop On Evaluating Natural
Language Processing Systems. Association For Computational
Linguistics, 1991.
Hausser, Roland (ed.): The coordinator's final report on the first
Morpholympics. LDV-Forum, 11(1):54--64, 1994. available via
Cole, Ronald A. (ed.): Survey of the State of the Art in Human
Language Technology, Chapter 13, e.g. at
Karlsson, F. et al (eds) (1995): Constraint Grammar, esp p269-83, pp359

Andrew Harley
Systems Manager - ELT Reference
Cambridge University Press

Direct line: (01223)325880