in reply to Match, Capture and get position of multiple patterns in the same string

So very close. What you really mean to do is:

#!/usr/bin/perl use strict; my $string = "CATINTHEHATWITHABAT"; my $regex = '\wAT'; my @matches = (); while ($string =~ /($regex)/gi){ my $match = $1; my $length = length($&); my $pos = length($`); my $start = $pos + 1; my $end = $pos + $length; my $hitpos = "$start-$end"; push @matches, "$match found at $hitpos "; } print "$_\n" foreach @matches;

The difference is that a foreach loop builds the list before you start, whereas the while loop re-executes the expression each time. This means that you are clobbering $& and friends at the start of your foreach loop, but using a while loop means the values are fresh.

Replies are listed 'Best First'.
Re^2: Match, Capture and get position of multiple patterns in the same string
by richardwfrancis (Beadle) on Nov 13, 2009 at 02:08 UTC

    Firstly, thanks to EVERYONE who has commented here. It's been a great help. I posted this before I went to bed and when I get to work this morning, you guys had solved all my problems! Wish it could be the same every day!

    I'll certainly check out the perlvar link as Fletch and moritz suggest. I always like a good gem!

    Kennethk, you know I tried a while loop cos I thought it should work but it kept hanging. I've since realised that this was down to doing something else in the loop that I didn't tell you guys about as it's a bit silly ;) . Basically I have the need to substitute the match within the string with a lowercase version of itself ie once the matches, positions etc are found the string will then go:

    From this: CATINTHEHATWITHABAT To this: catINTHEhatWITHAbat

    I was doing this simply as follows (adapted from the code moritz provided - thanks!):

    while ($string =~ /($regex)/pg){ my $match = ${^MATCH}; my $start = $-[0]; my $end = $+[0]; my $hitpos = "$start-$end"; my $lcmatch = lc($match); $string =~ s/$match/$lcmatch/g; push @matches, "$match found at $hitpos "; }

    However the substitution line seems to cause it to hang and I can't get my head around why as it should just be operating on the current match of which there are only 3 in this instance

    Can anyone make any suggestions other than storing each match string in the loop and doing the substitution separately outside the loop.

    Again, thanks for the help

    Regards, Richard
      The regex match inside the while condition stores its current position in pos($string) (see pos for detail), which the substitution resets, so on the next iteration it starts from the beginning again.

      So if you insist on doing the matching and substitution in two different steps, you have to manually set pos($string) after the substitution:

      use strict; use warnings; use 5.010; my $string = "CATINTHEHATWITHABAT"; my $regex = qr{\wAT}i; while ($string =~ m/($regex)/g){ my $match = $1; my $start = $-[0]; my $end = $+[0]; my $hitpos = "$start-$end"; my $lcmatch = lc($match); $string =~ s/$match/$lcmatch/g; # in the next iteration start over where we left off pos($string) = $end; say "$match found at $hitpos "; } say "string: $string";

      Since you put parenthesis around the regex, ${^MATCH} can be replaced by the shorter $1, and there's no need for the /p modifier.

      Perl 6 - links to (nearly) everything that is Perl 6.

        Thanks moritz that makes perfect sense. I didn't realise the substitution would reset this. I've implemented that now, thank you.

        Do I need to distribute points or anything in this forum?

        Cheers,

        Rich
      If you want to substitute at the same time as the matching, the following may work for you:

      #!/usr/bin/perl use strict; my $string = "CATINTHEHATWITHABAT"; my $regex = '\wAT'; my @matches = (); while ($string =~ s/($regex)/lc($1)/e){ my $match = $1; my $length = length($&); my $pos = length($`); my $start = $pos + 1; my $end = $pos + $length; my $hitpos = "$start-$end"; push @matches, "$match found at $hitpos "; } print "$_\n" foreach @matches; print "$string\n";

      I've used the e modifier (see perlretut) to evaluate the lower-case transliteration of the matched string.

      Caveat: This becomes an infinite loop if you use the i modifier, since you will continuously overwrite the first occurrence of 'cat'. If need case insensitivity, perhaps you'd want my $regex = '[A-Z][aA][tT]|[a-z][aA]T|[a-z]At'; or equivalent for your real case.

        Cool. Cheers!