AW: Corpora: List of abbreviations
Sabathy, Hellfried (Hellfried.Sabathy@bifab.de)
Wed, 6 May 1998 08:46:33 +0200
> "Manuel J. Maña López" wrote:
> > Hello,
> > I am looking for a list of abbreviations of common use in English
> (such as Ltd., Mr., Inc., ...). I have found some of them in Internet
> but they include a lot of acronyms I am not interested in.
> > Does anybody know if there is any available? Thanks.
> Pete Whitelock wrote:
>Why not just build your own? Presumably you are interested only
>in those which end in full stop. Go through a corpus and make a
>of all strings followed by full stop.
>You have to be slightly careful cos some corpora don't use full
>on any abbreviations.
That is right: I looked for abbrevations in an encyclopedia, and
of all full stops were at sentence ends. Recognition of
easiest with a combination of rules:
- small caps after the full stop means almost always
This found 70% of all abbrevations! Only exceptions are words
"pnp-transistor" at beginning of next sentence.
- looking at the last 4 characters (in German) or last 3
English, there was a paper on this by Brustkern(?) in the
distinguish between "words" and "garbage". The garbage are the
rest of the abbreviations.