in reply to Re: Incorrect Pattern Matching Behavior
in thread Incorrect Pattern Matching Behavior

Thanks for the quick reply. I actually don't want it to match this line. The line I'm looking for should look something like:

S 0.0 0.0 (OUTPUT)

How is this matching at all when the input doesn't have a first character of "S"? Shouldn't the ^S be enough to say that this line does not match?

Also, it uses \(\s*INOUT|OUTPUT\s*\). How can this match when there are non whitespaces between the opening ( and the string OUTPUT?

My rules I guess would be

Starts with S
Two floats (could be negative or integer portion only)
Opening parentheses possibly followed by whitespace
Either INOUT or OUTPUT (case insensitive) possibly followed by whitespace
Closing parentheses

I may be very confused here, so your help is much appreciated.
  • Comment on Re^2: Incorrect Pattern Matching Behavior

Replies are listed 'Best First'.
Re^3: Incorrect Pattern Matching Behavior
by hv (Prior) on Mar 22, 2006 at 02:03 UTC

    The problem is in the implementation of this clause:

    Either INOUT or OUTPUT

    Your regular expression is getting parsed as:

    / # either this ^ S \s+ [-]* \d+ [\.\d+]* \s+ [-]* \d+ [\.\d+]* \s* \( \s* IOPUT | # or this OUTPUT \s* \) /ix

    To achieve your aims, you need to tell perl where your list of alternates starts and ends, with capturing (...) or non-capturing (?:...) parens:

    / # all of this ^ S \s+ [-]* \d+ [\.\d+]* \s+ [-]* \d+ [\.\d+]* \s* \( \s* # and one of these two (?: IOPUT | OUTPUT ) # and all of this \s* \) /ix

    I'd also recommend using the extended layout permitted by the //x flag for long expressions like this.

    Hope this helps,

    Hugo

Re^3: Incorrect Pattern Matching Behavior
by McDarren (Abbot) on Mar 22, 2006 at 01:59 UTC
    Shouldn't the ^S be enough to say that this line does not match?
    Yes, I believe it should - and I'm afraid that part of it also has me stumped. Perhaps some other monk can explain why that is so.

    Update: ahh, of course - as others have pointed out below - it's because you haven't used parentheses to define the boundaries of your alternation :)

    Anyhow, getting back to your requirements, here is how I would do it:

    Update: oops, I just realised that I posted the wrong pattern. It can be simplified somewhat by grouping the part that matches the floats and using the {2} quantifier. I've updated it (output remains the same)

    use strict; use warnings; while (<DATA>) { if (/^S\s+(?:\-?\d+(?:\.\d+)?\s+){2}\(\s?(?:(INOUT|OUTPUT))\s?\)/) + { print "Matched:$_"; } else { print "Did NOT match:$_"; } } __DATA__ onn (bbcreccsnnl_output) !OUTPUT S 0.0 0.0 (OUTPUT) A 0.0 0.0 (OUTPUT) S 1 4 5 (OUTPUT) S 35 -27 ( INOUT ) S -26.95 32.73 (OUTPUT )
    The above outputs:
    Did NOT match: onn (bbcreccsnnl_output) !OUTPUT Matched:S 0.0 0.0 (OUTPUT) Did NOT match:A 0.0 0.0 (OUTPUT) Did NOT match:S 1 4 5 (OUTPUT) Matched:S 35 -27 ( INOUT ) Matched:S -26.95 32.73 (OUTPUT )
    Which I believe meets your requirements, yes?