SEVENTH MESSAGE UNDERSTANDING SYSTEM EVALUATION
AND MESSAGE UNDERSTANDING CONFERENCE (MUC-7)
Evaluation: 2-6 March 1998
Conference: April 1998
Washington, D.C. area
The Human Language Systems Tipster Text Program of the
Defense Advanced Research Projects Agency
Information Technology Office
The Message Understanding Conferences have provided on ongoing
forum for assessing the state of the art and practice in text analysis
technology and for exchanging information on innovative computational
techniques in the context of fully implemented systems that perform
realistic tasks. The evaluations have provided researchers and
potential sponsors and customers with a quantitative means to
appreciate the strengths and weaknesses of the technologies, and the
results reported on at the conferences have sparked customer interest
in the potential utility of the technologies.
The Seventh Message Understanding Conference (MUC-7) will provide
an opportunity for both new and experienced MUC participants to
participate in a flexible evaluation, suited to development needs and
abilities. It will provide:
* Opportunity to select among a variety of tasks: Named Entity
(NE), Coreference (CO), Template Element (TE), Template
Relationship (TR) and Scenario Template (ST).
* Two tasks for evaluating component technologies (NE and
CO), which use Standard Generalized Markup Language (SGML)
as output format
* Redesigned Information Extraction (IE) task, with two
domain-independent subtasks (TE and TR) separated from
domain-dependent subtask (ST).
* Emphases of ST task on portability and on minimizing
human resources required to participate in the evaluation.
* Three experimental tracks to explore new data sets and tasks.
Participation in MUC-7 is actively sought from both new and
veteran organizations. With the new and redesigned evaluation tasks,
MUC-7 offers a good opportunity for organizations to try out new ideas
for handling NLP problems that are of both scientific and practical
interest without having to participate in the entire range of tasks.
The conference itself will consist primarily of presentations and
discussions of innovative techniques, system design, and test results.
There will also be an opportunity for participants to demo their
evaluation systems. Attendance at the conference is limited to
evaluation participants and to guests invited by the DARPA Tipster Text
Program. A conference proceedings, including test results, will be
1 July 97: Application deadline for participation
15 July 97: Release of NE, CO, TE, TR, and example ST training
data and scorer
8 September 97: Release of Dry Run ST task definition,
training data, and scorer
29 Sept - 3 Oct 97: MUC-7 Dry Run (all participants)
6 February 98: Release of formal test ST task definition,
training data, and scorer
2-6 March 98: MUC-7 Formal Run
7-9 April 98: 7th Message Understanding Conference (tentative
DATA AND TASK DESCRIPTION:
The texts to be used for system development and testing are news
service articles from the New York Times News Service, supplied by the
Linguistic Data Consortium (LDC) [firstname.lastname@example.org]. Training, dry
run, and test data for all the tasks are extracted from a corpus of
approximately 158,000 articles. Sets of articles to be used in the
MUC-7 evaluation will be distributed via ftp upon payment of a one
time fee of $100 and upon signing of a user agreement for the use of
these texts. The user agreement can be retrieved from the LDC catalog
(Evaluation Agreements). The URL for the LDC home page is:
Five separate evaluations will be conducted as part of MUC-7.
The definition of these evaluations has been worked out since late
1996 by members of the MUC-7 Planning Committee. The evaluations
may be viewed as capturing the results of text analysis at various
levels of aggregation of information:
* Named Entity (NE) requires only that the system under
evaluation identify each bit of pertinent information in
isolation from all others.
* Coreference (CO) requires connecting all references to
* Template Element (TE) requires grouping entity attributes
together into entity "objects."
* Template Relationship (TR) requires identifying relationships
between template elements.
* Scenario Template (ST) requires identifying instances of a
task-specific event and identifying event attributes,
including entities that fill some role in the event; the
overall information content is captured via interlinked
* Experimental tracks using new data sets are variants
of the NE task. The task definition is the same as for the
basic NE task, but the texts are different.
* Experimental track involving a new task is a simplified
version of the TE task.
Key things to note about each evaluation task:
* NE covers named organizations, people, and locations, along
with date/time expressions and monetary and percentage
expressions; it requires production of SGML tags as output.
* CO covers noun phrases (common and proper) and personal
pronouns that are "identical" in their reference; it requires
production of SGML tags as output; the tags for coreferring
strings form "equivalance" classes, which are used for
* TE covers organizations, persons, and artifacts, which are
captured in the form of template "objects" consisting of a
predefined set of attributes.
* TR covers relationships among template elements, including
location and time relationships, which are captured in the
form of template "relations" consisting of a relationship
and the template elements participating in that
relationship. TR is a new task for MUC-7.
* ST covers a particular scenario, which is kept secret until
one month prior to testing in order to focus on system
portability; however, the generalized structure of a
scenario template is predefined, and example scenarios are
available for participants to examine. This task is domain
* Tasks for the experimental tracks are derived from NE and TE.
There is a World Wide Web site that allows automated testing
following the rules of MUC-6. It will be of particular value
to new participants. The website is password protected and you need to
be licensed to access the ACL/DCI disk from the LDC to obtain a
password from email@example.com. MUC-6 articles were taken from the
An anonymous ftp site will be available for downloading MUC-7
related material. This CFP and the MUC-7 Participant Agreement are
available to the public from the ftp site. Each participant (after
signing the LDC User Agreement and a MUC-7 participation agreement)
will receive a password to download the MUC-7 data, definitions, and
scoring software at the release times noted above.
The URL of the website is http://muc.saic.com. The ftp site is
TEST PROTOCOL AND EVALUATION CRITERIA:
MUC-7 participants may elect to do one or any combination of
tasks and experimental tracks. Participants will have access to
shared resources such as the training texts and annotations/templates,
task documentation, and scoring software.
All MUC-7 participants are encouraged to participate in the dry
run and take advantage of material available.
The formal test will be conducted during the first week in March.
It will be carried out by the participants at their own sites in
accordance with a prepared test procedure and the results submitted to
the ftp site for official scoring with the software prepared by SAIC
Test sets used for the evaluations will consist of 100 texts,
with subsets for some of the tasks. There will be different data sets
for the dry run and the formal test.
Systems will be evaluated using recall and precision metrics (all
tasks), F-measure (all tasks), and error-based metrics (all tasks
except CO). The computation of these metrics is based on the scoring
categories of correct, partial, incorrect, spurious, missing, and
noncommittal. MUC-7 participants will be able to familiarize
themselves with the evaluation criteria through usage of the
evaluation software, which will be released along with the training
INSTRUCTIONS FOR RESPONDING TO THE CALL FOR PARTICIPATION:
Organizations within and outside the U.S. are invited to respond
to this call for participation. By the time of the actual testing
phase of the evaluation, systems must be able to accept texts without
manual preprocessing, process them without human intervention, and
output annotations (NE, CO) or templates (TE, TR, ST) in the expected
Organizations should plan on allocating approximately two
person-months of effort for participation in the evaluation and
conference. It is understood that organizations will vary with
respect to experience with SGML text annotation, information
extraction, domain expertise/engineering, resources, contractual
demands/expectations, etc. Recognition of such factors will be made
in any analyses of the results.
Organizations wishing to participate in the evaluation and
conference must respond by July 1, 1997 by submitting a short
statement of interest via email and a signed copy of the MUC-7
participation agreement via surface mail.
1. The statement of interest should be submitted via email
to firstname.lastname@example.org and should include the following:
a. Evaluation task(s) (choose one or more)
* Named Entity
* Template Element
* Template Relationship
* Scenario Template
b. Primary point of contact. Please include name, surface
and email addresses, and phone and fax numbers.
c. Does your site have a copy of the MUC-6 proceedings?
2. The participation agreement can be downloaded from the
anonymous ftp site (ftp.muc.saic.com). A signed copy should be sent by
surface mail to Elaine Marsh, NRL - Code 5512, 4555 Overlook Ave. SW,
Washington, D.C. 20375-5337, USA.
If some questions cannot be deferred until the deadline for
responding to this call for participation has passed, you may send
them by email to Elaine Marsh (email@example.com), WITH COPIES TO
Ralph Grishman (firstname.lastname@example.org) and Nancy Chinchor
(email@example.com) to ensure that your message receives a timely
response from one of us.
MUC-7 PLANNING COMMITTEE:
Ralph Grishman, New York University, program co-chair
Elaine Marsh, Naval Research Laboratory, program co-chair
Chinatsu Aone, Systems Research and Applications
Lois Childs, Lockheed Martin
Nancy Chinchor, Science Applications International
Jim Cowie, New Mexico State University
Rob Gaizauskas, University of Sheffield
Megumi Kameyama, SRI International
Tom Keenan, U.S. Department of Defense
Boyan Onyshkevych, U.S. Department of Defense
Martha Palmer, University of Pennsylvania
Beth Sundheim, NCCOSC NRaD
Marc Vilain, MITRE
Ralph Weischedel, BBN Systems and Technologies