in reply to using lookaround assertions to grab info

I would usualy keep a state variable that lets me know what section of the input I'm in.
my $section = ''; #remember the last section label we encountered foreach (@m) { if ($_ =~ /^Dig No\s:\s(\w*)\s*Prior:\s*([0-9]*)\s*Digstrt:\s*([0-9]{ +2}\/[0-9]{2}\/[0-9]{2})\s*Time:\s*([0-9]{2}:[0-9]{2})/) { $section = 'Dig No'; $m{'DIG_NO' } = $1; $m{'PRIORITY'} = $2; $m{'DIGDATE' } = $3; $m{'DIGTIME' } = $4; } elsif ($_ =~ /^Address\s*:\s*(.*)Subdivsn/) { $section = 'Address'; $m{'ADDRESS' } = $1; } elsif ($_ =~ /^Remarks\s*:\s*(.*)/ || $section eq 'Remarks') { #do +this if we enter the section or were already in the section $section = 'Remarks'; $m{'REMARKS' } = $1; } }
Note that this code is quite simple and will only work if only one section continues accross multiple lines, if you need more than one sections that handles multiple lines the same basic idea can work, but it takes more work.

Replies are listed 'Best First'.
Re^2: using lookaround assertions to grab info
by punkish (Priest) on Jun 03, 2004 at 21:35 UTC
    Note that this code is quite simple and will only work if only one section continues accross multiple lines, if you need more than one sections that handles multiple lines the same basic idea can work, but it takes more work.
    Thanks for the advice. I did think of such an approach and then discarded it for the very reason you state above. I'll look at it again and see if I can finagle something useful.

    I guess the best way to state the problem is that the value of any label continues until a new label is encountered even if \n is encountered on the way. The labels are distinguished by \s*\w*\s:

      Why not
      my $section = ''; #remember the last section label we encountered foreach (@m) { if (/^Dig No\s:\s(\w*)\s*Prior:\s*([0-9]*)\s*Digstrt:\s*([0-9]{ +2}\/[0-9]{2}\/[0-9]{2})\s*Time:\s*([0-9]{2}:[0-9]{2})/) { $section = 'DIGTIME'; $m{'DIG_NO' } = $1; $m{'PRIORITY'} = $2; $m{'DIGDATE' } = $3; $m{'DIGTIME' } = $4; } elsif (/^Address\s*:\s*(.*)Subdivsn/) { $section = 'ADDRESS'; $m{'ADDRESS' } = $1; } elsif (/^Remarks\s*:\s*(.*)/ ) { $section = 'REMARKS'; $m{'REMARKS' } = $1; } elsif (/^\s*:\s*(.+?)\s*/) { $m{$section} .= $1; } }
      Have you though of extracting the match before the if elsif ... in the foreach loop?
      my $section = ''; foreach (@m) { if (/^\s*([\w\s]*?)\s*:/ && $1) { #if we matched and we captured a s +ection label $section = $1; } if ($section eq '...') { ... }
        @m is an array of lines and some lines have multiple sections, ergo this exact code would not work. Or am I missing something?