in reply to Regex Question

The first thing I noticed was that you are undefing $/. Apart from that just localising it it enough to set to undef, that is usually done when you are slurping the entire file.

The second thing I noticed was that you are using /smx.

If I am interpretung your snippet correctly, you are slurping the whole file, asking that . match \n (/s), asking that ^ and $ match either side of \n (/m), and then hoping that

/(^DES|^TN).*?/ will match just a single line, with the correct first 2 or 3 letters. It won't.

It will match starting at the first newline followed by DES or TN, but with go on to match the rest of the entire file as there is nothing to stop .*? matching.

By the time you get to your end criteria, the is nothing left for it to match against.

A lot of assumptions based on little evidence, but it does fit what I see:)

Replies are listed 'Best First'.
Re: Re: Regex Question
by set_uk (Pilgrim) on Jun 03, 2003 at 10:45 UTC
    I now have:-
    ((?:^DES|^TN).*?(?:(?:DATE|ZONE)[ A-Z0-9]*)(?=$^$)?)
    But given the data I posted earlier it still matches ZONE even when not followed by blank line.

    Trying to use (?=$^$)? to say only match the previous pattern if followed by the first blank line.

    But given data :-

    TN 001 0 02 01 05 RLS ZONE 001 07 AO3 08 09 DATE 9 MAR 2000
    Only matches upto ZONE and not DATE.

      Given the format of your data, you would be much better of using "paragraph mode". Ie. Setting $/ to '' rather than undef. The each read will give you exactly what (I think) you are trying to acheive with your regex.

      Try this on your data to see what I mean, the see perlvar $INPUT_RECORD_SEPERATOR for the details.

      #! perl -slw use strict; local $/ = ''; while( <DATA> ) { print "'$_'"; }

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


        I was trying to avoid using a blank line as a record separator because this file is prone to some corruption and I wouldn't want spurious blank lines interfering with the correct splitting of records.

        To use this approach I would have to preprocess the file to strip any blank lines not correctly terminating records.

        If I am going to do this I may as well just look for the record itself. How do I only match a string which is followed by a blank line?