Re: Corpora: Corpus of scientific texts

Lou Burnard (
Fri, 23 Oct 1998 10:09:33 +0100 (BST)

On Fri, 23 Oct 1998, John Milton wrote:

|> If you're doing research in an EU country, you are entitled to work
|> with the BNC (I'm not sure what the current policy for the 'rest of the
|> world' is... Lou Burnard?). > David Lee.
|Can't help whining... the current policy for the rest of the world is that
|we still can't have the BNC.

I was planning to save this news up until BNC World Edition was
actually available, but just for the record, I am pleased to announce
that the DTI has now given us permission to distribute the BNC
worldwide. This means that as of now you can sign up for our online
service at http://sara.natcorp.ox, and search the BNC whatever
your geographical situation.

UNFORTUNATELY we are still unable to distribute copies of release 1.0
of the BNC itself outside the EU for the following reasons:

(1) There are a few texts in the corpus for which we were unable to
obtain world rights from their publishers and which therefore we
cannot distribute in their entirety with the rest of the corpus

(2) These texts can be removed from the online server, but not
(obviously) from the existing CD-ROMs

FURTHERMORE since release of BNC version 1.0 we have identified and fixed
(a) several errors in the indexing
(b) several errors in the text categorization and headers
(c) countless errors in the POS-tagging (the Lancaster team have completely
redone this over the last two years)

THEREFORE we are planning to delay worldwide release until completion
of a new corrected version, now well advanced. I am hoping to get this
done by January of next year, but regular readers of this list know
how reliably I predict dates.

I can also confirm that the long-awaited BNC sampler is now back on
schedule for production, and should appear by Christmas.

Lou Burnard