pat_mc has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks -

I have been looking into the world of code evaluation expressions of regular expressions recently. Still, I have not been able to come up with a single integrated regex that would match the following set of strings: "Any string of type /a+b+/ that contains exactly one more 'b' than 'a's". The regex I am looking for should therefore match abb, aabbb and aaabbbb but not ab, aab, aabb or abbb.

Clearly, the matching strings could be obtained from the following code:
my @strings = qw( abb aabbb aaabbbb ab aab aabb abbb ); for ( @strings ) { our $a_counter = 0; our $b_counter = 0; if ( /(a(?{$a_counter ++;}))+(b(?{$b_counter ++;}))+/ && $b_counte +r == $a_counter + 1 ) { print "In '$_' there were $a_counter 'a's and $b_counter 'b's. +\n"; } }
As mentioned above, however, I would like to consolidate the information contained in the regex and the if statement into a single regex. It really should match if and only if the number of 'b's exceeds the number of 'a's in the string by one. I don't want any over-generation that I need to filter in a subsequent if statement.

Could anyone please help me out here?

Thanks in advance for your help!

Pat

Replies are listed 'Best First'.
Re: Integrating match counts into regex matching
by BrowserUk (Patriarch) on Dec 18, 2008 at 23:53 UTC

      You used "(?<!a)" and "(?!b)" as anchors, so your solution was designed to match a substring, but it doesn't work for 'a_abb' (false negative) and 'a_abbb' (false positive).

      for (qw[ ab abb abbb aabb aabbb aabbbb a_abb a_abbb ]) { m[ (?<!a) (?> (a+) (?{ length($^N) }) (b+) ) (?(?{ length($^N)-$^R != 1 })(?!)) ]x and print("$_\n"); }
      abb aabbb a_abb

        Indeed++ Hence the "Hmmm?"

        It can be dealt with:

        for( qw[ ab abb abbb aabb aabbb aabbbb a_abb a_abbb abbaabbbaaabbbaxab +b ] ) { print "$_ contains '$1'" while m[ ( (?<!a) (?{ local $a }) ( a (?{++$a}) )+ ( ?{ local $b }) ( b (?{++$b}) )+ (?!b ) ) (?(?{ $a+1 == $b }) (?=) | (?!) ) ]gx };; abb contains 'abb' aabbb contains 'aabbb' a_abb contains 'abb' abbaabbbaaabbbaxabb contains 'abb' abbaabbbaaabbbaxabb contains 'aabbb' abbaabbbaaabbbaxabb contains 'abb'

        But yours is a simpler approach, and probably quicker.

        But the primary purpose of posting was to demonstrate the regex if...then...else construct--not to do someone else's systems testing for them.

        Particularly as this is very unlikely to be the actual pattern they are trying to match


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      Nice two-liner, BrowserUk!

      I like the coding-elegance.

      Just two (small?) bits I don't understand, the rest is crystal-clear:

      1) What does (?<!a) do?

      2) What does (?!b) do?

      Thanks! - Pat
        1. What does (?<!a) do?

          Zero-length negative lookbehind. Ensures that the string of a's is not preceded by another a.

        2. What does (?!b) do?

          Zero-length negative lookahead. Ensures the matched string of b's is not followed by another b.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Integrating match counts into regex matching
by Not_a_Number (Prior) on Dec 19, 2008 at 00:55 UTC

    KISS?

    while ( <DATA> ){ print if /(a+)(b+)/ and length( $1 ) == length( $2 ) - 1; } __DATA__ a b x ab abb abbb aabb aabbb aabbbb ...
Re: Integrating match counts into regex matching
by tilly (Archbishop) on Dec 18, 2008 at 23:41 UTC
    If that is what you want, then that is what you want. But the cure is worse than the disease.
    /^(?>(?:a(?{ $count-- })|b(?{ $count++ }))+)(??{ $count==1 ? "" : "a" })\z/
    Don't forget to have a local our $count; in front of that. And don't forget to buy a bullet-proof vest in case the maintenance programmer ever tracks you down.
      Phew, tilly -

      This looks like a tricky one. Not sure I fully decipher it but I venture a transcript of your code here: "If the regex matches an 'a', decrease $count, if it matches a 'b', increase $count. Repeat as long as you can. When all the matching is done, check the value of $count. If it is '1' return the empty string, else return 'a'. Is this anywhere near the truth?

      Also not sure what the \z does.

      Thanks for your help though!

      Cheers - Pat
        That is correct. The \z means end of string. It keeps you from matching things like "ben".

        But I misread your problem. I will match things like "bab" which is not according to your spec.

Re: Integrating match counts into regex matching
by ig (Vicar) on Dec 19, 2008 at 01:53 UTC

    I agree with tilly and Not_a_Number about keeping it simple but...

    I find the following easier to understand. Of the alternatives I have seen for putting everything in the RE I find the following easier to understand.

    m[ ^ (a*) (??{ 'b' x ( length($1) + 1 ) }) $ ]x and print "'$_'\n" for qw[ ab abb abbb aabb aabbb aabbbb xaabbby ]; 'abb' 'aabbb'

    Update: I mean easier to understand than the alternatives that have been proposed for putting everything into the RE, not easier to understand than separating the condition. I would do it as Not_a_Number suggested myself.

    Update: Note that this solution is only correct for strings matching the pattern /^a+b+$/ and decides differently than the test in the original post for many of the broader set of strings matching /a+b+/ (e.g. "ababb", "babbb", "acabbb" and many others. It can only be made more correct by making it much more complicated, so really not worth pursuing.

Re: Integrating match counts into regex matching
by pat_mc (Pilgrim) on Dec 19, 2008 at 11:19 UTC
    Thanks to all who contributed and shared their knowledgeable views on how this problem could be solved! As always, I am fascinated by the wide range of answers obtained which - in my view - mainly originates from the different degrees of literality with which the initial question can be read. Particular thanks to tilly, BrowserUk and ikegami for their in-depth discussion of the wider scope of the problem.