> I'm looking for a list of high-frequency foreign words found in
> English text, e.g. words like "cafe" (with an acute accent over the final e),
> "resume" (acute accent over both e's) and facade (where c == c-cedilla), etc.
Perhaps you could give a more specific description of what you mean by
"foreign words". I'm not sure I'd classify any of your examples as foreign -
"facade" in particular has been used in English for about 400 years, the other
two for nearly 200 years.
If you mean words that occur frequently in English and are sometimes spelled
with diacritics or letters other than A to Z, I'd suggest harvesting a
machine-readable dictionary for such words. For example, Webster's has the
fa-cade also fa-c,ade
indicating that facade can be spelled with or without the cedilla in English.
You could then get some frequency statistics from a corpus, and cut the list
at a reasonable threshold.
- John Burger
The MITRE Corporation