in reply to Re: parsing of large files
in thread parsing of large files

Hi arturo,

The solution you presented will overwrite the output-file on each occurance of the section start. Furthermore, writing to closed filehandles isn't a very clean solution, IMHO.

You could try to use the magic '..' operator:

open OUT, '>&STDOUT' or die; while ( <DATA> ) { print OUT if /start-marker/../end-marker/ and !/(start-marker|end-marker)/; } close OUT or die; __DATA__ a b c start-marker d e end-marker f g start-marker h i end-marker j k
This will print the lines within the markers (thus: d,e,h and i in my example) but ignores the markers.

HTH,

(update: fixed some layout issues and used the actual '..' operator instead of the '...' one!).

-- JaWi

"A chicken is an egg's way of producing more eggs."

Replies are listed 'Best First'.
Re: Re: Re: parsing of large files
by Anonymous Monk on Mar 20, 2003 at 13:11 UTC
    First, thanks!

    And: There is no "end-section". the end is the start of the next section and a "start-section" is one of 5-6 different tags that implemebt some kind of hierarchy between sections.
    I still can parse the big file into files but in any case I need to save a data struct with the files names, sections headers, etc. so it seems as a double wrok.

    I finally solved it by Tie::File module, reading "line by line" from the tie array, and save for each section its name, its place at hierarchy and start and end index.
    I ended up with one read of the whole file, and then direct access to each section by its name.

    However, I was worried about the size of the tie array, but I guess it won't be bigger than 4 or 8 bytes multiple by the number of lines. I can live with (and correct me if I'm wrong :-)).

    thanks again!
    Keren.