Re: Regex Question

The first thing I noticed was that you are undefing $/. Apart from that just localising it it enough to set to undef, that is usually done when you are slurping the entire file.

The second thing I noticed was that you are using /smx.

If I am interpretung your snippet correctly, you are slurping the whole file, asking that . match \n (/s), asking that ^ and $ match either side of \n (/m), and then hoping that

/(^DES|^TN).*?/ will match just a single line, with the correct first 2 or 3 letters. It won't.

It will match starting at the first newline followed by DES or TN, but with go on to match the rest of the entire file as there is nothing to stop .*? matching.

By the time you get to your end criteria, the is nothing left for it to match against.

A lot of assumptions based on little evidence, but it does fit what I see:)

Comment on Re: Regex Question Download Code

Replies are listed 'Best First'.
Re: Re: Regex Question by set_uk (Pilgrim) on Jun 03, 2003 at 10:45 UTC
I now have:- `((?:^DES\|^TN).?(?:(?:DATE\|ZONE)[ A-Z0-9])(?=$^$)?)` [download] But given the data I posted earlier it still matches ZONE even when not followed by blank line. Trying to use (?=$^$)? to say only match the previous pattern if followed by the first blank line. But given data :- `TN 001 0 02 01 05 RLS ZONE 001 07 AO3 08 09 DATE 9 MAR 2000` [download] Only matches upto ZONE and not DATE.	[reply] [d/l] [select]
Re: Re: Re: Regex Question by BrowserUk (Patriarch) on Jun 03, 2003 at 11:11 UTC
Given the format of your data, you would be much better of using "paragraph mode". Ie. Setting $/ to '' rather than undef. The each read will give you exactly what (I think) you are trying to acheive with your regex. Try this on your data to see what I mean, the see perlvar $INPUT_RECORD_SEPERATOR for the details. `#! perl -slw use strict; local $/ = ''; while( <DATA> ) { print "'$_'"; }` [download] Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller	[reply] [d/l]
Re: Re: Re: Re: Regex Question by set_uk (Pilgrim) on Jun 03, 2003 at 11:51 UTC
I was trying to avoid using a blank line as a record separator because this file is prone to some corruption and I wouldn't want spurious blank lines interfering with the correct splitting of records. To use this approach I would have to preprocess the file to strip any blank lines not correctly terminating records. If I am going to do this I may as well just look for the record itself. How do I only match a string which is followed by a blank line?	[reply]
Re: Re: Re: Re: Re: Regex Question by BrowserUk (Patriarch) on Jun 03, 2003 at 12:53 UTC