I don't know any Tamil, or anything about the morphology of Tamil. But
I infer from the above statement that Tamil has somewhat complicated
verb conjugation. (therefore requiring some morphological analysis
prior to tagging.)
Kemal Oflazer has done work on part of speech tagging in Turkish, and
some of the tools he's used may be appropriate. From
http://www.cs.bilkent.edu.tr/~ko/pubs.html I find:
Kemal Oflazer, Morphological Analysis , chapter in Syntactic Wordclass
Tagging Hans van Halteren, Editor, Kluwer Academic Publishers, 1998.
Kemal Oflazer and Gvkhan T|r, Morphological Disambiguation by Voting
Constraints in Proceedings of ACL'97/EACL'97, The 35th Annual Meeting of
the Association for Computational Linguistics, July, 7-12, 1997, Madrid
Another possibility to look at is PC-KIMMO, a two-level morphological
analyser. See http://www.sil.org/pckimmo/.
> And I would like to limit
> the question of POS tagging currently only to the verbs.
One question I have is: How ambiguous is (written) Tamil with respect to
part of speech? i.e. are there (frequent) cases of words such as
English "present" which can be (for example) both a noun and a verb? If
written Tamil is unambiguous you may not need a statistical
disambiguation step -- just the morphological analysis.
Gregory Aist, firstname.lastname@example.org Ph.D. student, LTI, Carnegie Mellon
Project LISTEN: kids read, computer listens. http://www.cs.cmu.edu/~listen
Postal address: LTI, CMU, 4910 Forbes Ave., Pittsburgh PA 15213-3720 USA