in reply to Incorrect Pattern Matching Behavior

Well, it's because the lower-case "output" in your string is being matched (rather than the upper-case which is what I suspect you are expecting). You can confirm that like so:
if ($_ =~ /^S\s+[-]*\d+[\.\d+]*\s+[-]*\d+[\.\d+]*\s*\(\s*IOPUT|OUTPUT\ +s*\)/i) { print "PRE=$`\nMATCH=$&\nPOST=$'\nIt matches with case insensitive +...\n"; }
Which outputs:
PRE= onn (bbcreccsnnl_ MATCH=output) POST= !OUTPUT It matches with case insensitive...
That's the easy bit :)

Getting your expression to actually do what you want it to may be a little trickier. Perhaps if you could list the "rules" for the match, I (or others) may be able to help you craft an appropriate expression.

Cheers,
Darren :)

Replies are listed 'Best First'.
Re^2: Incorrect Pattern Matching Behavior
by T.G. Cornholio (Scribe) on Mar 22, 2006 at 00:42 UTC
    Thanks for the quick reply. I actually don't want it to match this line. The line I'm looking for should look something like:

    S 0.0 0.0 (OUTPUT)

    How is this matching at all when the input doesn't have a first character of "S"? Shouldn't the ^S be enough to say that this line does not match?

    Also, it uses \(\s*INOUT|OUTPUT\s*\). How can this match when there are non whitespaces between the opening ( and the string OUTPUT?

    My rules I guess would be

    Starts with S
    Two floats (could be negative or integer portion only)
    Opening parentheses possibly followed by whitespace
    Either INOUT or OUTPUT (case insensitive) possibly followed by whitespace
    Closing parentheses

    I may be very confused here, so your help is much appreciated.

      The problem is in the implementation of this clause:

      Either INOUT or OUTPUT

      Your regular expression is getting parsed as:

      / # either this ^ S \s+ [-]* \d+ [\.\d+]* \s+ [-]* \d+ [\.\d+]* \s* \( \s* IOPUT | # or this OUTPUT \s* \) /ix

      To achieve your aims, you need to tell perl where your list of alternates starts and ends, with capturing (...) or non-capturing (?:...) parens:

      / # all of this ^ S \s+ [-]* \d+ [\.\d+]* \s+ [-]* \d+ [\.\d+]* \s* \( \s* # and one of these two (?: IOPUT | OUTPUT ) # and all of this \s* \) /ix

      I'd also recommend using the extended layout permitted by the //x flag for long expressions like this.

      Hope this helps,

      Hugo

      Shouldn't the ^S be enough to say that this line does not match?
      Yes, I believe it should - and I'm afraid that part of it also has me stumped. Perhaps some other monk can explain why that is so.

      Update: ahh, of course - as others have pointed out below - it's because you haven't used parentheses to define the boundaries of your alternation :)

      Anyhow, getting back to your requirements, here is how I would do it:

      Update: oops, I just realised that I posted the wrong pattern. It can be simplified somewhat by grouping the part that matches the floats and using the {2} quantifier. I've updated it (output remains the same)

      use strict; use warnings; while (<DATA>) { if (/^S\s+(?:\-?\d+(?:\.\d+)?\s+){2}\(\s?(?:(INOUT|OUTPUT))\s?\)/) + { print "Matched:$_"; } else { print "Did NOT match:$_"; } } __DATA__ onn (bbcreccsnnl_output) !OUTPUT S 0.0 0.0 (OUTPUT) A 0.0 0.0 (OUTPUT) S 1 4 5 (OUTPUT) S 35 -27 ( INOUT ) S -26.95 32.73 (OUTPUT )
      The above outputs:
      Did NOT match: onn (bbcreccsnnl_output) !OUTPUT Matched:S 0.0 0.0 (OUTPUT) Did NOT match:A 0.0 0.0 (OUTPUT) Did NOT match:S 1 4 5 (OUTPUT) Matched:S 35 -27 ( INOUT ) Matched:S -26.95 32.73 (OUTPUT )
      Which I believe meets your requirements, yes?