Tim Buckwalter wrote:
> The big difference between Arabic and accented languages such as Spanish
> in this regard is that accent-less Spanish is probably sub-standard or
> at least informal orthography. Whereas it is the norm for an entire
> formal Arabic newspaper to have only a dozen or so thoughtfully-placed
> short vowels & diacritics, an unaccented Spanish newspaper would be hard
> to imagine (I've never seen one, at least), or one with accents placed
> only where there is not enough context to know what is intended.
So, the picture is (in a very black and white version): the
Spanish have fewer diacritics (both types and tokens) but use
virtually all the time, and the Arabs have a lot more of them,
but they hardly ever use them.
I have three questions:
- does this difference have any measurable effect on the
learning process (for native speakers who learn to read
- same for parsing and processing by humans
- same for NLP
Any pointers to any empirical data?
I realize that we are now really moving away from this list's
core business, so I'll be happy to continue this discussion
somewhere else if people prefer that.
[ One place to go could be the email list
that we have just set up for discussing Arabic NLP and Speech
processing issues, but that hasn't been officially launched
yet. Subscription is already open at
This archive was generated by hypermail 2b29 : Tue Apr 24 2001 - 00:19:33 MET DST