in reply to Re: Efficient regex matching with qr//; Can I do better?
in thread Efficient regex matching with qr//; Can I do better?

Ok, so I managed to get version 5.10 on to my local machine at least. But I am not really sure how one would construct such large, aggregated, regexes. I didn't get any further than this:
if ($text =~ /(?<p1>\b$pattern1\b)|(?<p2>\b$pattern2\b)/) { foreach my $foo (keys %+) {print $foo.','.$+{$foo}."\n";} }
but then of course I only get one of the matches (even if both match).

Could you please give me a little example of such a construct? The documentation didn't help me much, I'm afraid.

Many thanks, Ola

Replies are listed 'Best First'.
Re^3: Efficient regex matching with qr//; Can I do better?
by moritz (Cardinal) on Jul 14, 2008 at 12:30 UTC
    If you want the alternations to match at the same starting position, you might be able to fiddle something together with look-ahead groups (not sure it works), but generally that doesn't work very well.

    You could try to match once, reset pos to the previous starting position, remove the regex that caused the match and retry again. But I don't think that's very efficient.

    If you don't want to match at the same position, you can use the /g modifier in a while loop to match multiple times.

      So you're suggesting something like
      while ($text =~ /(\b$pattern1\b)|(\b$pattern2\b)/g) { # Do something with $+ }
      , or? But then I'm unable to match at the same position. I have probably misunderstood you somehow because this solution doesn't need named captures as far as I can tell.

      IF this is what you actually did mean, what do you think about some iterative solution where you remove the matched patterns and match again until you find no more? I have absolutely no idea whether that would speed up things in the end, though...
        I have probably misunderstood you somehow because this solution doesn't need named captures as far as I can tell.

        It only needs named captures if you want to know which one of the regexes matched.

        But then I'm unable to match at the same position.

        Yes, that's true. If you really need it, and only have a relatively small number of matches, you can do something along these lines:

        my $re = assemble_regex(\%hash); my $old_pos = pos; while (m/\G$re/){ $pos = $old_pos; # reset match position # ... extract name of matched regex in $matched_re here ... delete $hash{$matched_re}; # re-generate regex, without the one that previously matched: $re = assemble_regex(\%hash); }

        Note that it'll be rather expansive to build the regex many times, so only do this if you have a relatively low number of matches.