in reply to Re: Match, Capture and get position of multiple patterns in the same string
in thread Match, Capture and get position of multiple patterns in the same string

Firstly, thanks to EVERYONE who has commented here. It's been a great help. I posted this before I went to bed and when I get to work this morning, you guys had solved all my problems! Wish it could be the same every day!

I'll certainly check out the perlvar link as Fletch and moritz suggest. I always like a good gem!

Kennethk, you know I tried a while loop cos I thought it should work but it kept hanging. I've since realised that this was down to doing something else in the loop that I didn't tell you guys about as it's a bit silly ;) . Basically I have the need to substitute the match within the string with a lowercase version of itself ie once the matches, positions etc are found the string will then go:

From this: CATINTHEHATWITHABAT To this: catINTHEhatWITHAbat

I was doing this simply as follows (adapted from the code moritz provided - thanks!):

while ($string =~ /($regex)/pg){ my $match = ${^MATCH}; my $start = $-[0]; my $end = $+[0]; my $hitpos = "$start-$end"; my $lcmatch = lc($match); $string =~ s/$match/$lcmatch/g; push @matches, "$match found at $hitpos "; }

However the substitution line seems to cause it to hang and I can't get my head around why as it should just be operating on the current match of which there are only 3 in this instance

Can anyone make any suggestions other than storing each match string in the loop and doing the substitution separately outside the loop.

Again, thanks for the help

Regards, Richard
  • Comment on Re^2: Match, Capture and get position of multiple patterns in the same string
  • Download Code

Replies are listed 'Best First'.
Re^3: Match, Capture and get position of multiple patterns in the same string
by moritz (Cardinal) on Nov 13, 2009 at 07:42 UTC
    The regex match inside the while condition stores its current position in pos($string) (see pos for detail), which the substitution resets, so on the next iteration it starts from the beginning again.

    So if you insist on doing the matching and substitution in two different steps, you have to manually set pos($string) after the substitution:

    use strict; use warnings; use 5.010; my $string = "CATINTHEHATWITHABAT"; my $regex = qr{\wAT}i; while ($string =~ m/($regex)/g){ my $match = $1; my $start = $-[0]; my $end = $+[0]; my $hitpos = "$start-$end"; my $lcmatch = lc($match); $string =~ s/$match/$lcmatch/g; # in the next iteration start over where we left off pos($string) = $end; say "$match found at $hitpos "; } say "string: $string";

    Since you put parenthesis around the regex, ${^MATCH} can be replaced by the shorter $1, and there's no need for the /p modifier.

    Perl 6 - links to (nearly) everything that is Perl 6.

      Thanks moritz that makes perfect sense. I didn't realise the substitution would reset this. I've implemented that now, thank you.

      Do I need to distribute points or anything in this forum?

      Cheers,

      Rich
        Do I need to distribute points or anything in this forum?

        You don't need to do anything, but it's considered good style to upvote helpful replies and good question, once you've got some votes.

        See also Voting/Experience System and The Role of XP in PerlMonks.

        Perl 6 - links to (nearly) everything that is Perl 6.
Re^3: Match, Capture and get position of multiple patterns in the same string
by kennethk (Abbot) on Nov 13, 2009 at 16:03 UTC
    If you want to substitute at the same time as the matching, the following may work for you:

    #!/usr/bin/perl use strict; my $string = "CATINTHEHATWITHABAT"; my $regex = '\wAT'; my @matches = (); while ($string =~ s/($regex)/lc($1)/e){ my $match = $1; my $length = length($&); my $pos = length($`); my $start = $pos + 1; my $end = $pos + $length; my $hitpos = "$start-$end"; push @matches, "$match found at $hitpos "; } print "$_\n" foreach @matches; print "$string\n";

    I've used the e modifier (see perlretut) to evaluate the lower-case transliteration of the matched string.

    Caveat: This becomes an infinite loop if you use the i modifier, since you will continuously overwrite the first occurrence of 'cat'. If need case insensitivity, perhaps you'd want my $regex = '[A-Z][aA][tT]|[a-z][aA]T|[a-z]At'; or equivalent for your real case.

      Cool. Cheers!