Integrating match counts into regex matching

pat_mc has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Integrating match counts into regex matching by BrowserUk (Patriarch) on Dec 18, 2008 at 23:53 UTC
Hmmm? `our( $a, $b )=(0,0), m[ ( (?<!a) ( a (?{++$a}) )+ ( b (?{++$b}) )+ (?!b) ) (?(?{ $a+1 == $b }) (?=) \| (?!) ) ]x and print "'$1'" for qw[ ab abb abbb aabb aabbb aabbbb ];; 'abb' 'aabbb'` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^2: Integrating match counts into regex matching by ikegami (Patriarch) on Dec 19, 2008 at 08:49 UTC
You used "(?<!a)" and "(?!b)" as anchors, so your solution was designed to match a substring, but it doesn't work for 'a_abb' (false negative) and 'a_abbb' (false positive). `for (qw[ ab abb abbb aabb aabbb aabbbb a_abb a_abbb ]) { m[ (?<!a) (?> (a+) (?{ length($^N) }) (b+) ) (?(?{ length($^N)-$^R != 1 })(?!)) ]x and print("$_\n"); }` [download] `abb aabbb a_abb` [download]	[reply] [d/l] [select]
Re^3: Integrating match counts into regex matching by BrowserUk (Patriarch) on Dec 19, 2008 at 09:29 UTC
Indeed++ Hence the "Hmmm?" It can be dealt with: `for( qw[ ab abb abbb aabb aabbb aabbbb a_abb a_abbb abbaabbbaaabbbaxab +b ] ) { print "$_ contains '$1'" while m[ ( (?<!a) (?{ local $a }) ( a (?{++$a}) )+ ( ?{ local $b }) ( b (?{++$b}) )+ (?!b ) ) (?(?{ $a+1 == $b }) (?=) \| (?!) ) ]gx };; abb contains 'abb' aabbb contains 'aabbb' a_abb contains 'abb' abbaabbbaaabbbaxabb contains 'abb' abbaabbbaaabbbaxabb contains 'aabbb' abbaabbbaaabbbaxabb contains 'abb'` [download] But yours is a simpler approach, and probably quicker. But the primary purpose of posting was to demonstrate the regex if...then...else construct--not to do someone else's systems testing for them. Particularly as this is very unlikely to be the actual pattern they are trying to match Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^2: Integrating match counts into regex matching by pat_mc (Pilgrim) on Dec 19, 2008 at 00:04 UTC
Nice two-liner, BrowserUk! I like the coding-elegance. Just two (small?) bits I don't understand, the rest is crystal-clear: 1) What does `(?<!a)` do? 2) What does `(?!b)` do? Thanks! - Pat	[reply] [d/l] [select]
Re^3: Integrating match counts into regex matching by BrowserUk (Patriarch) on Dec 19, 2008 at 00:10 UTC
What does `(?<!a)` do? Zero-length negative lookbehind. Ensures that the string of a's is not preceded by another a. What does `(?!b)` do? Zero-length negative lookahead. Ensures the matched string of b's is not followed by another b. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^4: Integrating match counts into regex matching by ikegami (Patriarch) on Dec 19, 2008 at 08:21 UTC
Re^5: Integrating match counts into regex matching by BrowserUk (Patriarch) on Dec 19, 2008 at 08:54 UTC
Some notes below your chosen depth have not been shown here
Re: Integrating match counts into regex matching by Not_a_Number (Prior) on Dec 19, 2008 at 00:55 UTC
KISS? `while ( <DATA> ){ print if /(a+)(b+)/ and length( $1 ) == length( $2 ) - 1; } __DATA__ a b x ab abb abbb aabb aabbb aabbbb ...` [download]	[reply] [d/l]
Re: Integrating match counts into regex matching by tilly (Archbishop) on Dec 18, 2008 at 23:41 UTC
If that is what you want, then that is what you want. But the cure is worse than the disease. `/^(?>(?:a(?{ $count-- })\|b(?{ $count++ }))+)(??{ $count==1 ? "" : "a" })\z/` [download] Don't forget to have a `local our $count;` in front of that. And don't forget to buy a bullet-proof vest in case the maintenance programmer ever tracks you down.	[reply] [d/l] [select]
Re^2: Integrating match counts into regex matching by pat_mc (Pilgrim) on Dec 18, 2008 at 23:57 UTC
Phew, tilly - This looks like a tricky one. Not sure I fully decipher it but I venture a transcript of your code here: "If the regex matches an 'a', decrease `$count`, if it matches a 'b', increase `$count`. Repeat as long as you can. When all the matching is done, check the value of `$count`. If it is '1' return the empty string, else return 'a'. Is this anywhere near the truth? Also not sure what the `\z` does. Thanks for your help though! Cheers - Pat	[reply] [d/l] [select]
Re^3: Integrating match counts into regex matching by tilly (Archbishop) on Dec 19, 2008 at 00:06 UTC
That is correct. The \z means end of string. It keeps you from matching things like "ben". But I misread your problem. I will match things like "bab" which is not according to your spec.	[reply]
Re: Integrating match counts into regex matching by ig (Vicar) on Dec 19, 2008 at 01:53 UTC
I agree with tilly and Not_a_Number about keeping it simple but... ~~I find the following easier to understand.~~ Of the alternatives I have seen for putting everything in the RE I find the following easier to understand. `m[ ^ (a) (??{ 'b' x ( length($1) + 1 ) }) $ ]x and print "'$_'\n" for qw[ ab abb abbb aabb aabbb aabbbb xaabbby ]; 'abb' 'aabbb'` [download] Update:* I mean easier to understand than the alternatives that have been proposed for putting everything into the RE, not easier to understand than separating the condition. I would do it as Not_a_Number suggested myself. Update: Note that this solution is only correct for strings matching the pattern /^a+b+$/ and decides differently than the test in the original post for many of the broader set of strings matching /a+b+/ (e.g. "ababb", "babbb", "acabbb" and many others. It can only be made more correct by making it much more complicated, so really not worth pursuing.	[reply] [d/l]
Re: Integrating match counts into regex matching by pat_mc (Pilgrim) on Dec 19, 2008 at 11:19 UTC
Thanks to all who contributed and shared their knowledgeable views on how this problem could be solved! As always, I am fascinated by the wide range of answers obtained which - in my view - mainly originates from the different degrees of literality with which the initial question can be read. Particular thanks to tilly, BrowserUk and ikegami for their in-depth discussion of the wider scope of the problem.	[reply]