Are the strings in square brackets a known closed set, or do they vary freely (unpredictably)? If they are a known closed set (and you can enumerate them), does each particular header string always relate to the same number of args in the lines that follow it, or does the number of args per line vary unpredictably for some (or all) headers?

In any case, I don't see anything very complicated here, even for doing input one line at a time (rather than one "record" at a time, as suggested by tachyon). For that matter, I don't see why you need to use a "regex that's appropriate". If the lines following the headers always have space-separated tokens, the best approach would probably be "split".

You didn't mention what you need to do with the data lines, but let's assume that a hash of arrays would be useful, where the hash keys are the header strings and the array elements are the lines that follow the header string:

my %section; my $hdr_string; # (update: this was intended to be a scalar) while (<>) { next unless ( /\S/ ); # skip blank lines if ( /^\[(.*)\]/ ) { # this is a header line $hdr_string = $1; } else { # this is a data record push @($section{$hdr_string}}, $_; } } # All the lines under a given "header" are now in # the array @{$section{header}}; at this point you can # loop over the sections to manipulate the data records # as appropriate -- e.g.: for my $header ( keys %section ) { for my $rec ( @{$section{$header}} ) { my @recfields = split( /\s+/, $rec ); # ... do whatever is to be done with field data ... } }
Having the hash keyed by section name can be handy if there are known sections that need special or complicated treatment -- you can pass the particular hash element (which is an array reference) to a subroutine created for that type of section. In fact, you can have a set of subroutine references in a hash keyed by the section header string, so that the data and the handling for the data are accessed by the same key string.

In reply to Re: file parsing problem by graff
in thread file parsing problem by tamarind

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.