Quoting Afsaneh Fazly <email@example.com>:
> I need to build a part-of-speech tagger for a new language
> (for which there is no PoS-tagger available). For this, I need
> to hand-annotate a minimum amount of text. I would like to know
> how much text (minimum of course) I need to hand-tag. Also,
> for this much text, what is the reasonable size of the tagset
> used for annotation?
this is a question about the sample complexity of POS tagging. citeseer is
overloaded right now, but this link
Shlomo Argamon-Engelson and Ido Dagan (1999) Committee-Based Sample Selection
for Probabilistic Classifiers, in Journal of Artificial Intelligence Research,1999
is a good place to look.
also, at this year's CoNLL, there was a paper on creating a POS tagger in a
This archive was generated by hypermail 2b29 : Tue Nov 12 2002 - 13:36:36 MET