in reply to Re: Re: Regex Question
in thread Regex Question

Given the format of your data, you would be much better of using "paragraph mode". Ie. Setting $/ to '' rather than undef. The each read will give you exactly what (I think) you are trying to acheive with your regex.

Try this on your data to see what I mean, the see perlvar $INPUT_RECORD_SEPERATOR for the details.

#! perl -slw use strict; local $/ = ''; while( <DATA> ) { print "'$_'"; }

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Replies are listed 'Best First'.
Re: Re: Re: Re: Regex Question
by set_uk (Pilgrim) on Jun 03, 2003 at 11:51 UTC
    I was trying to avoid using a blank line as a record separator because this file is prone to some corruption and I wouldn't want spurious blank lines interfering with the correct splitting of records.

    To use this approach I would have to preprocess the file to strip any blank lines not correctly terminating records.

    If I am going to do this I may as well just look for the record itself. How do I only match a string which is followed by a blank line?

      You might get away with this, provided the spurious blank lines don't turn up just after the ZONE 001 in the third section of your example dataset.

      #! perl -slw use strict; my $data = do{ local $/; <DATA> }; print "'$1'" while $data =~ m[ ( ^(?:DES|TN) .+? (?:^DATE|ZONE) .+\n\s*(?:\n|$) ) ]gsx;

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller