file parsing problem

tamarind has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: file parsing problem by tachyon (Chancellor) on Apr 01, 2004 at 02:26 UTC
You set the input record separator to \n\n and then you will read a record at a time. `local $/ = "\n\n"; while(my $record = <DATA>) { next unless $record =~ m/^Some Header/; my ( $header, @lines ) = split "\n", $record; #print "$header @lines\n"; }` [download] cheers tachyon	[reply] [d/l]
Re: file parsing problem by graff (Chancellor) on Apr 01, 2004 at 03:31 UTC
Are the strings in square brackets a known closed set, or do they vary freely (unpredictably)? If they are a known closed set (and you can enumerate them), does each particular header string always relate to the same number of args in the lines that follow it, or does the number of args per line vary unpredictably for some (or all) headers? In any case, I don't see anything very complicated here, even for doing input one line at a time (rather than one "record" at a time, as suggested by tachyon). For that matter, I don't see why you need to use a "regex that's appropriate". If the lines following the headers always have space-separated tokens, the best approach would probably be "split". You didn't mention what you need to do with the data lines, but let's assume that a hash of arrays would be useful, where the hash keys are the header strings and the array elements are the lines that follow the header string: my %section; my $hdr_string; # (update: this was intended to be a scalar) while (<>) { next unless ( /\S/ ); # skip blank lines if ( /^\[(.*)\]/ ) { # this is a header line $hdr_string = $1; } else { # this is a data record push @($section{$hdr_string}}, $_; } } # All the lines under a given "header" are now in # the array @{$section{header}}; at this point you can # loop over the sections to manipulate the data records # as appropriate -- e.g.: for my $header ( keys %section ) { for my $rec ( @{$section{$header}} ) { my @recfields = split( /\s+/, $rec ); # ... do whatever is to be done with field data ... } } [download] Having the hash keyed by section name can be handy if there are known sections that need special or complicated treatment -- you can pass the particular hash element (which is an array reference) to a subroutine created for that type of section. In fact, you can have a set of subroutine references in a hash keyed by the section header string, so that the data and the handling for the data are accessed by the same key string.	[reply] [d/l]