Re: Corpora: kwic concordances with Perl
Noord G.J.M. van (firstname.lastname@example.org)
Fri, 8 Oct 1999 10:20:35 +0200 (METDST)
Doug Cooper writes:
> At 22:15 7/10/99 +0200, Noord G.J.M. van wrote:
> >no, this is not a good idea for large files (like corpora). You
> >have the full file in memory; you don't want that.
> Oh, if it's spectacularly big you can just use some embedded
> non-text-item separator tag (eg </END-OF-BOOK>) to reduce size:
well, my suggestion was to use paragraph break for that. That seems
more general than </END-OF-BOOK> or whatever:
$/=""; # reads a paragraph at a time. This gives unexpected results on
# dos files (more like slurp then...
# but if you use dos files you're asking for trouble anyway