Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: Regexes: finding ALL matches (including overlap)

by nobull (Friar)
on Jun 04, 2005 at 09:11 UTC ( [id://463497]=note: print w/replies, xml ) Need Help??


in reply to Re: Regexes: finding ALL matches (including overlap)
in thread Regexes: finding ALL matches (including overlap)

It's a little messy to reuse this, because to do it programatically requires use re 'eval'
No, you can (and should) use qr// to avoid this.
local our $count; my $inc_count = qr/(?{$count++})/; /..*..*.$inc_count(?!)/;

Update: local our not my.

Replies are listed 'Best First'.
Re^3: Regexes: finding ALL matches (including overlap)
by blokhead (Monsignor) on Jun 04, 2005 at 12:21 UTC
    You're right, use re 'eval' is not absolutely required, and I shouldn't have said it like that. But beware! Your example code works fine on just an instance-by-instance basis. But if you want to do this programatically and extensibly, then my warning about closure-ing lexicals applies. It's tricky to make a generic-use sub that does this kind of matching.

    You may be tempted to do the following, but it won't work:

    sub match_all_ways { my ($string, $regex) = @_; my $count; my $incr = qr/(?{$count++})/; $string =~ /(?:$regex)$incr(?!)/; return $count; } print match_all_ways("abcdef", qr/..*..*./); # 20 print match_all_ways("abcdef", qr/..*..*./); # undef
    It's because the qr// object is compiled just once and always refers to the first instance of $count. If you call this sub more than once, you will always get undef.

    You have to do something ugly like this to get around it:

    sub match_all_ways { use vars '$count'; my ($string, $regex) = @_; local $count = 0; my $incr = qr/(?{$count++})/; $string =~ /(?:$regex)$incr(?!)/; return $count; }
    or this
    { my $count; my $incr = qr/(?{$count++})/; sub match_all_ways { my ($string, $regex) = @_; $count = 0; $string =~ /(?:$regex)$incr(?!)/; return $count; } }
    So yes, it can be done programatically without use re 'eval', but it's non-trivial and a little messy ;)

    blokhead

      sub match_all_ways { my ($string, $regex) = @_; my $count; my $incr = qr/(?{$count++})/; $string =~ /(?:$regex)$incr(?!)/; return $count; } print match_all_ways("abcdef", qr/..*..*./); # 20 print match_all_ways("abcdef", qr/..*..*./); # undef

      It's because the qr// object is compiled just once and always refers to the first instance of $count. If you call this sub more than once, you will always get undef.

      I see what you mean by lexicals closured in regexes not behaving as one would expect. I would have expected the second print to produce 40 instead of undef (i.e. I would have expected $count to behave like a C static variable, as is the case for "regular" closures). Is there any way to rationalize the actual behavior without diving too deeply into the Perl internals? (I ask because without some rationalization for such an odd behavior there is little chance I will remember it.)

      the lowliest monk

        Is there any way to rationalize the actual behavior without diving too deeply into the Perl internals?

        Its very simple: it's a bug. (In an experimental feature.) Basically the way perls regex engine handles embedded code is subtly wrong in a number of ways. One aspect of this is that you should never use lexicals inside of regexes inside of a repeatable scope (such as the body of a loop or a subroutine). If you are doing a one off it will probably work as you expect, but as soon as you stick the code in a subroutine or something like that and call it twice things dont work out properly. The simple workaround as blokhead explained is to use package level variables and local.

        I beleive dave_the_m has intentions of fixing this one day. But until then pay careful attention to the fact that embedded code is advertised as provisional and experimental which means that you can't really cry too much when it breaks.

        ---
        $world=~s/war/peace/g

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://463497]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-03-28 14:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found