Voronich has asked for the wisdom of the Perl Monks concerning the following question:

I've come across another strange wrinkle in this regex driven dispatch that I've been adding to.

I have two lines of representative text:

111 MatchMe [This is why] 222 I need to be dispatched to the same callback: For This Reason

What I want (and don't think I should reasonably expect) is an expression that behaves precisely as the following two expressions:

^(\d+) MatchMe \[(.*)\]$ ^(\d+) I need to be dispatched.*\:\s(.*)$

So those two will match the appropriate lines just duckily. Plus (and more importantly) the values of the $1 and $2 registers are the number and what we'll call the "reason description" (because...uhm... that's what it is.)

I attempted to do this (well, I did it.):

^(\d+) (MatchMe \[(.*)\]$)|(I need to be dispatched.*\:\s(.*)$)

And that will match both lines just fine. But the contents of the registers is, as you might expect, messy (read: values in different places depending on which line is being matched.)

Am I just trying to stretch the semantics of a single regex match too far in my attempt to not disturb the underlying driver code?

What I would LOVE to do is get in there and muck with the driver so I could have N expressions match any particular callback. But frankly I don't think that option is available to me (it's very late in the game and that's very well established core driver code.)

Any fun creative insights?


Thanks again,

- V

Replies are listed 'Best First'.
Re: multiple-line match mucking with regex registers
by GrandFather (Saint) on Jul 10, 2006 at 21:39 UTC

    Use an alternation (|) in a non-capture group to match the alternative bits:

    use warnings; use strict; while (<DATA>) { chomp; next if ! /^(\d+) (?:MatchMe \[|I need to be dispatched.*\:\s)(.*? +)\]?$/; print "Line $1, reason: $2\n"; } __DATA__ 111 MatchMe [This is why] 222 I need to be dispatched to the same callback: For This Reason

    Prints:

    Line 111, reason: This is why Line 222, reason: For This Reason

    DWIM is Perl's answer to Gödel
      Yep. That looks like the right answer.

      I was trying to do something similar, but had it factored wrong.

      w00t! Thanks =)
Re: multiple-line match mucking with regex registers
by BrowserUk (Patriarch) on Jul 10, 2006 at 21:48 UTC

    You could do it this way: (Basically merging the two regexes so that only two sets of capture brackets appear.)

    #! perl -slw use strict; my @lines = ( '111 MatchMe [This is why]', '222 I need to be dispatched to the same callback: For This Reason', ); for ( @lines ) { m/ ^(\d+) \s (?: (?-x:I need to be dispatched.*?: ) | (?-x:MatchMe \[) ) (.*?) \]? $ /x and print "'$1'$2'"; } __END__ C:\test>junk '111'This is why' '222'For This Reason'

    Though you will have to decide whether the possibility of obtaining a false matche against lines like update: these

    111 MatchMe [This is why 222 I need to be dispatched to the same callback: For This Reason]

    is a problem for your application?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      If I'd thought ahead I'd have used a lookahead like this:

      use warnings; use strict; while (<DATA>) { chomp; next if ! /^ (\d+)\s # Capture the number (?: # Start of non-capture group MatchMe\s\[(?=.*\]$) # Match matchme [.*] | I\sneed\sto\sbe\sdispatched.*\:\s(?=[^\]]*$) ) (.*?)\]?$/x; #capture reason print "Line $1, reason: $2\n"; } __DATA__ 111 MatchMe [This is why] 222 I need to be dispatched to the same callback: For This Reason 112 MatchMe [This is why 223 I need to be dispatched to the same callback: For This Reason]

      OP should note the positive lookahead asertions (?=...) that prevent the false matches found by the simpler expression. Prints:

      Line 111, reason: This is why Line 222, reason: For This Reason

      DWIM is Perl's answer to Gödel
      Ugh. Ya've gotta be so careful with terms when talking about regexes. By "multi-line matching" I didn't mean a match that spans across two lines of data. I meant an expression that would match different 'types' of lines. My apologies for the confusion. That's what I get for trying to put all the particulars in the subject.

      So no, that kind of false positive is not an issue. The lines (log files) I'm matching against are pretty strongly formatted. Plus, these are remarkably oversimplified examples, so that kind of clash would be very very tough to conjure.
Re: multiple-line match mucking with regex registers
by diotalevi (Canon) on Jul 11, 2006 at 02:40 UTC

    You might try using my super-duper named capture variable module.

    use Regexp::NamedCaptures; if ( /^(?< \$number >\d+) (?:MatchMe \[(?< \$reason_why >.*)\]|I need +to be dispatched.*:\s(< \$reason_why >.*))$/ ) { # $reason_why got set from either $2 or $3 but we don't really car +e about the numbered variables now. print "$number $reason_why\n"; }

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      Looks juicy. But ...uhm... one of the reasons this is strictly a regex problem, is... uhh...

      /me shifts about nervously with his hands in his pockets.

      because the actual code is python