in reply to Re: all matches
in thread all matches

But this one doesn't handle overlapping cases and does match either greedyly or ungreedly but misses the ones not in between, doesn't it?

Replies are listed 'Best First'.
Re: Re: Re: all matches
by diotalevi (Canon) on Mar 25, 2004 at 13:25 UTC

    Picky, pick! Just reset pos() and it does that. The difference between yours and my code is that the behaviour is well defined and I don't use anything marked "experimental." Whether the re engine finds an optimization which allows it to skip some execution is occasionally a problem with the sort of code that was originally proposed.

    An update. It pleased me to show just the added line by itself and then it occurred to me that this is better done in a continue block so that next() won't accidentally skip the positioning.

    while ( ... ) { ... } continue { # Reset the position of the regex match so that it will restart # just after the start of this match. This is done inside continue +{} # to safeguard itself against a next() that someone else might # add later. pos() = $-[0] + 1; }
    while ( /($regex)/gx ) { push @substrings, { match => $1, pos => ( pos() - length $1 ), length => length $1,}; # Restart the checking at one character after the match started. pos() = $-[0] + 1; }
      let me ask you further:) I tried the code below $_="AHAAHKAKADL"; while ( /^(.)*(H)(.)*(K)(.)*(D)(.)*$/gx ) { print $1."\n"; print $2."\n"; print $3."\n"; print $4."\n"; print $5."\n"; print $6."\n"; print $7."\n"; print "-------\n"; # Restart the checking at one character after the match started. pos() = $-[0] + 1; } but that didn't work ask I expected. Actually my intention is dividing up the sequence in all combinations with patterns(these can be regex with willcards and all sort of-and the numberof patterns depends the user input) and keep the positions of the macthes of positions and length of the all patterns and the substrings in between. So I thought may be I could put them in a single pattern as the following: /^(.)*($pat1)(.)*($pat2)(.)*($pat3)(.)*$/ and match these to the whole sequence to extract the macthes and substrings in between, before and after. so above example is a simple case where patterns are single character. i would really appreciate any new point of view. thanks

        Oh I see. That's more difficult since perl's regex engine is designed to stop anytime it finds a match. You're after every possible match which is something I hear a POSIX regex engine will do by default. I'd be wary of trying to trick perl's engine with your trailing (?!) because the thing is designed to try to stop early and IIRC will try to avoid branches when possible, especially exponential permutations which is what you're after.

        Perhaps you should write your program in a language that has a POSIX engine instead.