Corpora: ELRA: WLR Validation Centres

From: Magali Duclaux (
Date: Wed Jan 16 2002 - 18:06:12 MET

  • Next message: Mahtab Nikkhou: "Corpora: ELDA promotes EUROMAP Language Technologies in France"

    ELRA Technical Centers
    Our apologies if you receive multiple copies


    1. Preamble

    Describing, assuring and improving the quality of language resources are
    important tasks. The assurance of such quality is an important factor in
    ELRA’s success. In the start up phase of ELRA it was foreseen that a
    Network of Technical Centers should be established to handle quality
    To date a technical center for the validation of spoken language resources
    has been established. ELRA now intends to initiate the establishment of
    a network of technical centers for the validation of written language resources,
    the Validation Centers for Written Language Resources or VC_WLR.
    Written resources include lexicons as well as text corpora, possibly enriched
    with all kinds of annotations (POS-tags, syntactic structures, etc.).
    The procedure to establish the VC_WLR is identical to the one adopted in
    establishing the technical centers for spoken language resources, viz. they
    are to be established via an open call. Those European institutions willing
    to act as a VC_WLR for ELRA should send an offer to ELRA.
    The contents of this offer are described below. In particular, the offer must
    contain a proposal on how to address the problem of the detailed and
    thorough knowledge of a wide variety of languages required by the validation
    of multilingual resources.

    ELRA’s Board will decide which institutions will be selected. The selection
    of each candidate institution will be based on its ability to fulfill the tasks
    described in Section2. The organizational and financial aspects are described
    in Section 3.

    2.Work packages (WP) of the VC_WLR

    2.1 Extending the Methodology for Describing the Quality and Content
    of Existing WLR

    In the catalogue of ELRA many WLR are offered whose quality and
    content is not yet described in a satisfactory way. Some projects have
    resulted in linguistic resources distributed by ELRA that are comparable
    across languages in accordance with a commonly agreed content and
    format specification (e.g. PAROLE). However, almost no written data
    distributed by ELRA have been subject to validation by an external party
    and in accordance with a commonly agreed validation scheme (except for
    a limited number of PAROLE lexicons, and recently in the context of the
    ENABLER project). Though some research into the validation of linguistic
    resources has taken place and recommendations and guidelines have been
    formulated (e.g. Nancy Underwood et al., June 1998; Lou Burnard for text
    corpora), these have to be reviewed and where necessary adapted and extended
    to develop a concrete and workable methodology for the ELRA validation of written
    linguistic resources. The knowledge and expertise gained in the successful
    approach to validation taken in the SpeechDat family of spoken resources and
    by the existing ELRA validation center for spoken resources could be taken into
    consideration here, and its methods and approaches translated into an approach
    adapted for written language resources while maintaining the key elements that
    determined the success of the approach to speech.

    The first task of the VC_WLR is to establish and/or extend the methodology for
    quality and content description so far developed. The related document should
    focus on the quality and content of the WLR offered in the ELRA catalogue.
    A standard form should be developed for describing the content and quality of a
    WLR, starting from the form currently in use and taking into account the work
    carried out within TEI, OLAC, etc. The WLR in the ELRA
    catalog will have to be described according to this standard. This description will
    be used as a basis for providing any (potential) user with a quick overview in the
    ELRA catalogue relating to the quality and content of each WLR offered.

    Output of WP2.1:
    - Document describing methodology concerning quality and content
    - Content and quality description of all ELRA WLR

    2.2 Improving the Quality of Existing WLR

    Existing WLR may have errors that could be removed with reasonable effort.
    The task of the VC_WLR is to establish a procedure to remove these errors.
    Especially a procedure has to be established which handles the errors reported
    by users of WLR (bug reporting procedure). Further, the existing WLR can be
    improved by better documentation, by reformatting according to established
    standards and by content changes. A similar procedure for spoken language
    resources has been proposed and is currently being implemented and experimented
    with, hence it is sensible to investigate to what extent the procedure proposed
    for SLR can be adopted for the improvement of WLR and what modifications and or
    extensions are necessary or desirable.
    The quality of the existing WLR should be gradually improved in accordance with
    a priority scheme that has to be worked out in close cooperation with ELRA’s
    validation committee. The scheme has to be approved by the ELRA board.

    Output of WP 2.2:
    - Report describing the procedure to be used to improve existing WLR
    - Improve existing WLR according to a priority scheme

    2.3 Quality Standards for WLR

    The VC_WLR have to play a leading role in establishing quality standards for
    WLR. for this task the VC_WLR have to cooperate with organizations involved
    in the production of WLR such as the consortia of the PAROLE and SIMPLE
    projects, and with ELRA’s distribution agency (currently ELDA). Additionally,
    the extent to which existing recommendations, guidelines and proposed standards
    from groups such as the EAGLES and ISLE projects can be incorporated should
    be considered throughout.

    Output of WP 2.3:
    - Report describing the procedure for building up relationships with significant
    WLR producers and standards groups
    - Following on from the report, the establishment of those relationships

    2.4 Validation of New WLR

    Owners of WLR regularly offer their WLR to ELRA for distribution. ELRA has
    the distribution carried out by its distribution agency (currently ELDA). Each
    time a WLR is offered for distribution, the task of the VC_WLR is to establish
    in cooperation with the owner of the WLR a manual containing:
    - The specification of the content of the WLR,
    - The validation criteria for checking the quality of the WLR,
    - The procedure to validate the WLR.
    Based on this manual the VC_WLR have to validate any new WLR offered for

    Output of WP 2.4:
    - Report on the validation procedure as specified in a specific contract between
    ELDA and the center(s)

    2.5 Reporting

    Twice a year the VC_WLR must report work undertaken to date to the board of
    ELRA via the head of the validation committee.

    Output of WP 2.5:
    - Status reports

    3. Organizational and Financial Issues

    3.1 Relation between ELRA and VC_WLR

    Concerning the tasks 2.1, 2.2, 2.3, 2.5 as described above the relation between
    ELRA and the institution(s) that are appointed as VC_WLR will be regulated by
    a contract between ELRA and those institutions. The contract has to be renewed
    after every fiscal year of ELRA by the Board of ELRA. Three months before the end
    of each fiscal year of ELRA the Board of ELRA will decide on the financial support
    to be given to the VC_WLR for the next fiscal year to perform the tasks 2.1, 2.2, 2.3,
    2.5. Annually, a letter of intent will describe a budget for the year for the VC_WLR.
    The initial amount made available will be approximately 15K EUR.

    The ELRA validation committee will act as a steering committee for all activities
    related to validation of written resources. All actions proposed by the validation
    committee and agreed upon between the validation committee and the appointed
    VC_WLR will have to be approved by the ELRA Board.

    3.2 Relation between ELDA and the VC_WLR

    Separate contracts will be made with ELDA concerning task 2.4 on a case-by-case

    4. Format and Procedure for Offer

    To apply to be a VC_WLR, send your offer by e-mail (as ASCII or RTF files,
    approx. 2000 words) to the CEO of ELRA (Khalid Choukri, and
    to the head of the ELRA validation committee (Harald Hoege, The e-mail should contain:
    1. Name of the proposing institute
    2. The name of the person at the institute who will be the head of the VC_WLR.
    3. A statement outlining the suitability of the institute to act as a VC_WLR.
    4. A proposal on how the institute plans to provide for the required detailed and
    thorough knowledge of a wide variety of languages.
    5. A list of personnel who will work on the tasks to be undertaken by the VC_WLR.
    6. A possible start date
    7.3 Sketch of the work for the work packages described that can be carried out
    within the fiscal year 2002 (1.1.02 31.12.02) for a budget of inferior or equal to 15KEUR.
    For each work package a rough estimate for the costs should be given.

    Proposals are due by Friday March 1, 2002.

    55-57, rue Brillat Savarin
    75013 Paris
    Tel.: +33 1 43 13 33 33
    Fax: +33 1 43 13 33 30


    This archive was generated by hypermail 2b29 : Wed Jan 16 2002 - 18:27:24 MET