Melly has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monkeys,

I'm having problems working out how to perform the following regex. I want to match foobar:Xgonk where X is between 1 and 5 characters, and can be anything except 'gonk' (or the beginning of 'gonk') or '@'. I then want to remove gonk or @.

So...
foobar:hellogonk should give me foobar:hello
foobar:gonk shouldn't match
foobar:higonk should give me foobar:hi
foobar:helloworldgonk shouldn't match (gonk is too late)

Needless to say, the '@' isn't really a problem, but 'gonk' is giving me a real headache. Any ideas?

BTW this is only part of the regex, so

if(/whatever/ and $& !~ /gonk/)
isn't really much help... and would have problems with the variable length anyway afaik.

Tom Melly, tom@tomandlu.co.uk

Replies are listed 'Best First'.
Re: RegEx - match !foo followed by foo
by merlyn (Sage) on Mar 16, 2006 at 12:56 UTC
    Not sure I completely understand, but does this do it for you?
    if (/^(foobar:((?!=gonk|\@).)+)/) { print "matched $1\n"; }
    That's your classic "inchworm" pattern, so it's not the fastest in the world.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.


    update: Yeah, just like in real life, he changes the spec after I write the implementation. {grin}

    And the spec is still broken:

    I'm having problems working out how to perform the following regex. I want to match foobar:Xgonk where X is between 1 and 5 characters, and can be anything except 'gonk' (or the beginning of 'gonk') or '@'. I then want to remove gonk or @.
    The last "or @" is spurious. There could never be an "@" there, since it has to have a trailing "gonk" where we stop.

      LOL - you think you've got it bad, this wasn't even my specification, and the guy doing that actual spec. had absolutely no idea what he was asking for (or at least was incapable of expressing it).

      It was like asking a 2-year old for the plot of "The Tale of Peter Rabbit" - they just say "rabbits" over and over again...

      Yeah, in my spec. the "or @" is spurious (and the whole @ is something of a red herring). In RealLife(tm) 'gonk' can either be 'gonk' or @ at all stages (the actual strings are '|' or '\.br\')

      Anyway, Corion's final solution got me there... I'd get him to give you a few perl-lessons, he obviously knows way more than you ;)

      Tom Melly, tom@tomandlu.co.uk
        Extra ++ for a good sense of humor. :)

        ---
        It's all fine and dandy until someone has to look at the code.

      Hi Merlyn

      Sorry, my explanation was ambiguous (and had a mistake) - please see my reply to Corion

      Many thanks (we are not worthy)

      Tom Melly, tom@tomandlu.co.uk
Re: RegEx - match !foo followed by foo
by Corion (Patriarch) on Mar 16, 2006 at 13:02 UTC

    I read your problem differently than merlyn does. It seems to me that you want to match only if there is gonk at the end of the string, and you only care for up to the first five letters of the word between foobar: and gonk:

    use strict; use Test::More tests => 4; sub ungonk { local $_ = $_[0]; if (/^(foobar:.{1,5}).*gonk$/) { return $1 } else { return undef }; }; is ungonk('foobar:hellogonk'), 'foobar:hello'; is ungonk('foobar:gonk'), undef; is ungonk('foobar:higonk'), 'foobar:hi'; is ungonk('foobar:helloworldgonk'), 'foobar:hello';

      Thanks Corion, but I don't think either merlyn's or your solution works... also I made one mistake in my examples:

      foobar:helloworldgonk shouldn't match, because neither gonk nor @ can be found after no more than 5 preceding characters

      Merlyn's solution fails (afaik) because it doesn't require the non-gonk to be followed by gonk (or @).

      Your solution fails (afaik) because foobar:gonkgonk would return 'foobar:gonk' (and should return nothing).

      To put it another way, if 'gonk' was a single character (say '£'), then I would do:

      /foobar:[^£@]{1,5}[£@]/

      Hope that makes it clearer

      Tom Melly, tom@tomandlu.co.uk

        There are the edge cases of foobar:gonkogonk and foobar:ogonkgonk, which I've added to my below test cases. You need to decide if gonkogonk and ogonkgonk should be rejected or accepted. My solution rejects the first but accepts the second case.

        use strict; use Test::More tests => 6; sub ungonk { local $_ = $_[0]; if (/^(foobar:(?!gonk).{1,5})gonk$/) { return $1 } else { return undef }; }; is ungonk('foobar:hellogonk'), 'foobar:hello'; is ungonk('foobar:gonk'), undef; is ungonk('foobar:higonk'), 'foobar:hi'; is ungonk('foobar:helloworldgonk'), undef; is ungonk('foobar:gonkgonk'), undef; is ungonk('foobar:gonkogonk'), undef; is ungonk('foobar:ogonkgonk'), 'ogonk';

        Update: After reading your specification again, you don't want gonk to be found within the first five characters, but it must appear at the end. I think the below program does that, and rejects :ogonkgonk and :gonkogonk.

        use strict; use Test::More tests => 7; sub ungonk { local $_ = $_[0]; if (/^(foobar:(?:(?!gonk).){1,5})gonk$/) { return $1 } else { return undef }; }; is ungonk('foobar:hellogonk'), 'foobar:hello'; is ungonk('foobar:gonk'), undef; is ungonk('foobar:higonk'), 'foobar:hi'; is ungonk('foobar:helloworldgonk'), undef; is ungonk('foobar:gonkgonk'), undef; is ungonk('foobar:gonkogonk'), undef; is ungonk('foobar:ogonkgonk'), undef;
Re: RegEx - match !foo followed by foo
by johngg (Canon) on Mar 16, 2006 at 14:14 UTC
    I think a good approach is to break the problem down into sections and make smaller compiled regular expression building blocks then put them together for the final solution. Hopefully, I've understood the requirement but the following seems to do the trick.

    #!/usr/local/bin/perl -w # use strict; # Make some building blocks. What we want at the start; what we don't +want # after "foobar:" using a negative look-ahead assertion; any five # characters; what we want at the end (which we will discard later). # our $rxFooBar = qr{foobar:}; our $rxNotGonkish = qr{(?!(?:g(?:o(?:n(?:k)*)*)*|@))}; our $rxFive = qr{.{1,5}}; our $rxGonk = qr{(?:gonk|@)}; # Put the building blocks together using "()" to capture everything ex +cept # the "gonk" or "@" at the end; # our $rxPutItAllTogether = qr{($rxFooBar$rxNotGonkish$rxFive)$rxGonk}; while(<DATA>) { chomp; print "$_\n"; print /$rxPutItAllTogether/ ? "$1\n\n" : "Failed\n\n"; } __END__ foobar:hellogonk foobar:gonk foobar:higonk foobar:helloworldgonk foobar:ggonk foobar:gogonk foobar:gongonk foobar:gonkgonk foobar:gonkygonk foobar:@gonk foobar:its@ foobar:toomany@
    Which produces the following output.

    foobar:hellogonk foobar:hello foobar:gonk Failed foobar:higonk foobar:hi foobar:helloworldgonk Failed foobar:ggonk Failed foobar:gogonk Failed foobar:gongonk Failed foobar:gonkgonk Failed foobar:gonkygonk Failed foobar:@gonk Failed foobar:its@ foobar:its foobar:toomany@ Failed
    Cheers,

    JohnGG

Re: RegEx - match !foo followed by foo
by holli (Abbot) on Mar 16, 2006 at 13:09 UTC
    Shouldn't foobar:helloworldgonk give you foobar:helloword, instead of foobar:hello?


    holli, /regexed monk/

      Sorry Holli, that one was a mistake - it should return nothing (now corrected)

      Tom Melly, tom@tomandlu.co.uk