Corpora: New Release from the LDC

From: LDC Office (
Date: Thu Dec 06 2001 - 19:40:53 MET

  • Next message: Dave Sheard: "Re: Corpora: hip hop"

                 ** CETEMPúblico Version 1.7 **

    The Linguistic Data Consortium (LDC) is pleased to announce the
    availability of CETEMPúblico Version 1.7.

    CETEMPúblico (Corpus de Extractos de Textos Electrónicos MCT/Público), a
    single CD-ROM publication, contains newspaper text of the Portuguese
    daily newspaper, PÚBLICO. It was created by the Computational
    Processing of Portuguese project through an agreement between PÚBLICO
    and the Portuguese Ministry of Science and Technology (MCT).

    The material includes roughly 2,600 editions of PÚBLICO, dating from
    1991 to 1998 and amounting to approximately 180 million words.

    CETEMPúblico is intended for research and development in natural
    language processing (NLP); additionally, it is suitable for other
    Portuguese language research.

    For more detailed information, please visit:

    in Portuguese:

    in English:

    Institutions that have membership in the LDC during the 2001
    Membership Year will be able to receive this corpus free of charge.
    Nonmembers may purchase this publication for $200.

    ** Please note that a signed user agreement is required for both member
    and nonmember requests. **

    If you need additional information before placing your order, or
    would like to inquire about membership in the LDC, please send email to
    <> or call (215) 573-1275.

    Linguistic Data Consortium Phone: (215) 573-1275
    3615 Market Street Fax: (215) 573-2175
    Suite 200 email:
    Philadelphia, PA 19104-2608 www:

    This archive was generated by hypermail 2b29 : Thu Dec 06 2001 - 19:44:15 MET