I am a student of Indology and Tamil Studies and I am trying to find out
how I could create a POS tagger for Tamil. To answer this question
theoretically or practically is part of my M.A. thesis.
We have or will have a corpus consisting of modern Tamil literature very
soon. Currently we are encoding the texts according to the CES with the
level 1 encoding, which encodes the overall structure of the texts.
There hasn't been much done for Tamil in this direction as far as I know
and we don't have any already annotated corpus that we could use as
I have written a little macro that can recognize Tamil verbs in their
easiest and simplest conjugation to some extend. And I would like to limit
the question of POS tagging currently only to the verbs. But this macro was
only a little test to give me an idea.
I wonder where and how to begin?
I am sorry for bothering you with such a novice question but any help and
suggestion is highly appreciated.
With best wishes,