In any case, I don't see anything very complicated here, even for doing input one line at a time (rather than one "record" at a time, as suggested by tachyon). For that matter, I don't see why you need to use a "regex that's appropriate". If the lines following the headers always have space-separated tokens, the best approach would probably be "split".
You didn't mention what you need to do with the data lines, but let's assume that a hash of arrays would be useful, where the hash keys are the header strings and the array elements are the lines that follow the header string:
Having the hash keyed by section name can be handy if there are known sections that need special or complicated treatment -- you can pass the particular hash element (which is an array reference) to a subroutine created for that type of section. In fact, you can have a set of subroutine references in a hash keyed by the section header string, so that the data and the handling for the data are accessed by the same key string.my %section; my $hdr_string; # (update: this was intended to be a scalar) while (<>) { next unless ( /\S/ ); # skip blank lines if ( /^\[(.*)\]/ ) { # this is a header line $hdr_string = $1; } else { # this is a data record push @($section{$hdr_string}}, $_; } } # All the lines under a given "header" are now in # the array @{$section{header}}; at this point you can # loop over the sections to manipulate the data records # as appropriate -- e.g.: for my $header ( keys %section ) { for my $rec ( @{$section{$header}} ) { my @recfields = split( /\s+/, $rec ); # ... do whatever is to be done with field data ... } }
In reply to Re: file parsing problem
by graff
in thread file parsing problem
by tamarind
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |