Re: Re: parsing of large files

Hi arturo,

The solution you presented will overwrite the output-file on each occurance of the section start. Furthermore, writing to closed filehandles isn't a very clean solution, IMHO.

You could try to use the magic '..' operator:

open OUT, '>&STDOUT' or die;

while ( <DATA> )
{
  print OUT if /start-marker/../end-marker/ 
     and !/(start-marker|end-marker)/;
}

close OUT or die;
__DATA__
a
b
c
start-marker
d
e
end-marker
f
g
start-marker
h
i
end-marker
j
k
[download]

This will print the lines within the markers (thus: d,e,h and i in my example) but ignores the markers.

HTH,

(update: fixed some layout issues and used the actual '..' operator instead of the '...' one!).

-- JaWi

"A chicken is an egg's way of producing more eggs."

Comment on Re: Re: parsing of large files Download Code

Replies are listed 'Best First'.
Re: Re: Re: parsing of large files by Anonymous Monk on Mar 20, 2003 at 13:11 UTC
First, thanks! And: There is no "end-section". the end is the start of the next section and a "start-section" is one of 5-6 different tags that implemebt some kind of hierarchy between sections. I still can parse the big file into files but in any case I need to save a data struct with the files names, sections headers, etc. so it seems as a double wrok. I finally solved it by Tie::File module, reading "line by line" from the tie array, and save for each section its name, its place at hierarchy and start and end index. I ended up with one read of the whole file, and then direct access to each section by its name. However, I was worried about the size of the tie array, but I guess it won't be bigger than 4 or 8 bytes multiple by the number of lines. I can live with (and correct me if I'm wrong :-)). thanks again! Keren.	[reply]

Replies are listed 'Best First'.

Re: Re: Re: parsing of large files
by Anonymous Monk on Mar 20, 2003 at 13:11 UTC

I finally solved it by Tie::File module, reading "line by line" from the tie array, and save for each section its name, its place at hierarchy and start and end index.
I ended up with one read of the whole file, and then direct access to each section by its name.

However, I was worried about the size of the tie array, but I guess it won't be bigger than 4 or 8 bytes multiple by the number of lines. I can live with (and correct me if I'm wrong :-)).

thanks again!
Keren.

[reply]