in reply to all matches

This is better written as a while loop.

while ( /($regex)/gx ) { push @substrings, { match => $1, pos => ( pos() - length $1 ), length => length $1 }; }

Replies are listed 'Best First'.
Re: Re: all matches
by oz (Novice) on Mar 25, 2004 at 06:54 UTC
    But this one doesn't handle overlapping cases and does match either greedyly or ungreedly but misses the ones not in between, doesn't it?

      Picky, pick! Just reset pos() and it does that. The difference between yours and my code is that the behaviour is well defined and I don't use anything marked "experimental." Whether the re engine finds an optimization which allows it to skip some execution is occasionally a problem with the sort of code that was originally proposed.

      An update. It pleased me to show just the added line by itself and then it occurred to me that this is better done in a continue block so that next() won't accidentally skip the positioning.

      while ( ... ) { ... } continue { # Reset the position of the regex match so that it will restart # just after the start of this match. This is done inside continue +{} # to safeguard itself against a next() that someone else might # add later. pos() = $-[0] + 1; }
      while ( /($regex)/gx ) { push @substrings, { match => $1, pos => ( pos() - length $1 ), length => length $1,}; # Restart the checking at one character after the match started. pos() = $-[0] + 1; }
        let me ask you further:) I tried the code below $_="AHAAHKAKADL"; while ( /^(.)*(H)(.)*(K)(.)*(D)(.)*$/gx ) { print $1."\n"; print $2."\n"; print $3."\n"; print $4."\n"; print $5."\n"; print $6."\n"; print $7."\n"; print "-------\n"; # Restart the checking at one character after the match started. pos() = $-[0] + 1; } but that didn't work ask I expected. Actually my intention is dividing up the sequence in all combinations with patterns(these can be regex with willcards and all sort of-and the numberof patterns depends the user input) and keep the positions of the macthes of positions and length of the all patterns and the substrings in between. So I thought may be I could put them in a single pattern as the following: /^(.)*($pat1)(.)*($pat2)(.)*($pat3)(.)*$/ and match these to the whole sequence to extract the macthes and substrings in between, before and after. so above example is a simple case where patterns are single character. i would really appreciate any new point of view. thanks