Re: Corpora: line joining

From: Susana Sotelo Docio (
Date: Sat Feb 24 2001 - 12:33:42 MET

  • Next message: Kalina Bontcheva: "Corpora: RANLP'2001: First Call for Papers"


    > I need to fix an output from a tagger and join consecutive lines of text, so
    > that, for example, this:
    > de PREP
    > a ART
    > turns into this:
    > da CPR
    > Does anyone know how to do this in sed or perl?

    If the output of the tagger is a big file, you could prefer flex (under
    unix/linux). It would be:

    ------------------------------file contrac.lex------------------
    ^de\tPREP\na\tART\n { printf("da\tCPR\n"); }

    You must compile this code:

       flex contrac.lex; gcc -o contrac lex.yy.c -lfl

       contrac < > tagged_text.out

    If you prefer perl, the script could be something like:


        $newline = <>;
        if($newline =~ /^a\tART\n/) { print "da\tCPR\n" }
        else { print $_ . $newline }
      else { print }

    Syntax: > tagged_text.out

    Under DOS, you must replace \n with \r\n. I assume tabs between word forms
    and tags.

    Susana Sotelo Docío
    Facultade de Filoloxía _o)
    Universidade de Santiago / \\
    "Neunu ti at a abberrer mai si thocceddas a sas jannas _(___V
    cun mudos thoccos de ocros" #96506

    This archive was generated by hypermail 2b29 : Mon Feb 26 2001 - 09:18:11 MET