Re: Corpora: Corpus Linguistics User Needs

C Hogan (
Wed, 29 Jul 1998 11:14:17 -0400

Henning Reetz writes:

> I don't have to be a car mechanic to drive a car. Why do
> I have to be a programmer to use a corpus?

The argument here turns on the meaning of the word "use": It is not
necessary to be a car mechanic if all you want out of your car is to
drive it to work, turn right and left, stop and accelerate, etc. On the
other hand, if you would like to put in a new engine, or tune-up your
car, then yes, you do need to be an auto mechanic.

Similarly for corpus linguistics: if all you want to do is get word
counts from your corpus, then you can probably rely on existing software.
If, however, you want to do really custom stuff, then you should
probably learn to program.

Here are my thoughts about what people should do about the programming

1. Learn Perl
- Perl is optimized for dealing with text
- Perl is interpreted, and easy to use
- Perl runs on the three main operating systems of merit:
Unix, MacOS and Windows.
- It's not elegant, but it is fairly forgiving

2. Produce libraries for Perl specific to Corpus linguistics
- Although Perl is great for text, it would be nice to
have libraries for reading common format and dealing
with such data. It is also apparent that a really
good statistics package is in order.

You don't need to buy Internet access to use free Internet e-mail.
Get completely free e-mail from Juno at
Or call Juno at (800) 654-JUNO [654-5866]