Re: Regexes: finding ALL matches (including overlap)

Replies are listed 'Best First'.
Re^2: Regexes: finding ALL matches (including overlap) by nobull (Friar) on Jun 04, 2005 at 09:11 UTC
It's a little messy to reuse this, because to do it programatically requires `use re 'eval'` No, you can (and should) use `qr//` to avoid this. `local our $count; my $inc_count = qr/(?{$count++})/; /.....$inc_count(?!)/;` [download] Update: `local our` not `my`.	[reply] [d/l] [select]
Re^3: Regexes: finding ALL matches (including overlap) by blokhead (Monsignor) on Jun 04, 2005 at 12:21 UTC
You're right, `use re 'eval'` is not absolutely required, and I shouldn't have said it like that. But beware! Your example code works fine on just an instance-by-instance basis. But if you want to do this programatically and extensibly, then my warning about closure-ing lexicals applies. It's tricky to make a generic-use sub that does this kind of matching. You may be tempted to do the following, but it won't work: `sub match_all_ways { my ($string, $regex) = @_; my $count; my $incr = qr/(?{$count++})/; $string =~ /(?:$regex)$incr(?!)/; return $count; } print match_all_ways("abcdef", qr/...../); # 20 print match_all_ways("abcdef", qr/...../); # undef` [download] It's because the qr// object is compiled just once and always refers to the first instance of $count. If you call this sub more than once, you will always get undef. You have to do something ugly like this to get around it: `sub match_all_ways { use vars '$count'; my ($string, $regex) = @_; local $count = 0; my $incr = qr/(?{$count++})/; $string =~ /(?:$regex)$incr(?!)/; return $count; }` [download] or this `{ my $count; my $incr = qr/(?{$count++})/; sub match_all_ways { my ($string, $regex) = @_; $count = 0; $string =~ /(?:$regex)$incr(?!)/; return $count; } }` [download] So yes, it can be done programatically without `use re 'eval'`, but it's non-trivial and a little messy ;) blokhead	[reply] [d/l] [select]
Re^4: Regexes: finding ALL matches (including overlap) by tlm (Prior) on Jun 04, 2005 at 13:45 UTC
`sub match_all_ways { my ($string, $regex) = @_; my $count; my $incr = qr/(?{$count++})/; $string =~ /(?:$regex)$incr(?!)/; return $count; } print match_all_ways("abcdef", qr/...../); # 20 print match_all_ways("abcdef", qr/...../); # undef` [download] It's because the qr// object is compiled just once and always refers to the first instance of $count. If you call this sub more than once, you will always get undef. I see what you mean by lexicals closured in regexes not behaving as one would expect. I would have expected the second `print` to produce 40 instead of `undef` (i.e. I would have expected `$count` to behave like a C static variable, as is the case for "regular" closures). Is there any way to rationalize the actual behavior without diving too deeply into the Perl internals? (I ask because without some rationalization for such an odd behavior there is little chance I will remember it.) the lowliest monk	[reply] [d/l]
Re^5: Regexes: finding ALL matches (including overlap) (its a bug) by demerphq (Chancellor) on Jun 04, 2005 at 16:47 UTC
Re^2: Regexes: finding ALL matches (including overlap) by kaif (Friar) on Jun 04, 2005 at 04:23 UTC
Great! This is exactly the code idea I wanted. Are there any other ways without using such a construct (just for the sake of TIMTOWDI)? I was always unsure of the level of support of enclosing code within regexen. Do you know what kinds of things can go wrong?	[reply]
Re^3: Regexes: finding ALL matches (including overlap) by ikegami (Patriarch) on Jun 04, 2005 at 04:51 UTC
Do you know what kinds of things can go wrong? Backtracking can screw things up: `my $count; 'ac' =~ / a (?{ $count++ }) b \| a (?{ $count++ }) c /x; # 1. Matches 'a' in first branch. # 2. Increments $count to 1. # 3. Fails to match 'b'. # 4. Matches 'a' in second branch. # 5. Increments $count to 2. # 6. Matches 'c'. print("$count\n"); # 2` [download] The fix is to use `local`. When the regexp backtracks through a local, the old value is restored. The old value is also restored when the regexp succesfully matches, so you need to save the result. `my $count; our $c = 0; 'ac' =~ / (?: a (?{ local $c = $c + 1 }) b \| a (?{ local $c = $c + 1 }) c ) (?{ $count = $c }) # Save result. /x; # 1. Matches 'a' in first branch. # 2. Increments $c to 1. # 3. Fails to match 'b'. # 4. Undoes increment ($c = 0). # 5. Matches 'a' in second branch. # 6. Increments $c to 1. # 7. Matches 'c'. # 8. $count = $c. print("$count\n"); # 1` [download]	[reply] [d/l] [select]
Re^2: Regexes: finding ALL matches (including overlap) by kaif (Friar) on Jun 04, 2005 at 18:00 UTC
So I just read through perlre and I couldn't find something: how does one include a (code-based) conditional expression in a regex, analogous to actions in P::RD? Is it even possible? If so, then one could not only find the last match (which may differ slightly from reversing the result of a reversed regex): `"abcdef" =~ /(.....)(?{$last = $^N})(?!)/; print "[$last]\n"; ## "[def]"` [download] but also the (say) tenth match. Another solution to my problem would be possible if P::RD had non-greedy matches. Is it likely that this will be implemented soon? I guess I could try hacking on it myself. P.S.: Has anyone ever used customre? Super Search gave back only one result ...	[reply] [d/l]