Corpora: Call for participation: meeting on annotation and software standards

From: Nancy M. Ide (
Date: Thu Feb 10 2000

  • Next message: COMP staff: "Re: Corpora: minimum size of corpus?"

                       *** CALL FOR PARTICIPATION ***

                Large Corpus Annotation and Software Standards

               Post-conference session held in conjunction with
              Thursday, May 4, 2000, 1-6pm, Seattle, Washington

    This meeting is intended to bring together researchers and developers
    from a variety of domains in text, speech, video, etc., to look
    broadly at the technical issues that bear on the development of
    software systems and standards for the annotation and exploitation of
    linguistic resources. The goal is to lay the groundwork for the
    definition of a data and system architecture to support corpus
    annotation and exploitation that can be widely adopted within the

    Among the issues to be addressed are:

         o layered data architectures
         o system architectures for distributed databases
         o support for plurality of annotation schemes
         o impact and use of XML/XSL
         o support for multimedia, including speech and video
         o tools for creation, annotation, query and access of corpora
         o mechanisms for linkage of annotation and primary data
         o applicability of semi-structured data models, search and query
           systems, etc.
         o evaluation/validation of systems and annotations

    The motivation for this meeting is the American National Corpus (ANC)
    effort, which will begin corpus creation within the year. We
    anticipate that the ANC will provide a significant resource for
    natural language processing, and we therefore seek to identify
    state-of-the-art methods for its creation, annotation, and
    exploitation. Also, as a national and freely available resource, the
    data and system architecture of the ANC is likely to become a de facto
    standard. We therefore hope to draw together leading researchers and
    developers to establish a basis for the design of a system to support
    the creation and use of the ANC.

    At present, the format of the meeting is open, and we invite
    suggestions for topics, presentations, etc. Those interested should
    contact before April 1, 2000.


    Nancy Ide
    Department of Computer Science
    Vassar College
    Poughkeepsie, NY 12604-0520 USA
    Tel: +1 914 437-5988 Fax: +1 914 437-7498

    NOTE: A Birds-of-a-feather meeting for those interested in the American
    National Corpus effort will be held immediately following the discussion.

    A related workshop will be held at the LREC conference on May 29-30,
    2000; see for information.

