Re: Match, Capture and get position of multiple patterns in the same string

Replies are listed 'Best First'.
Re^2: Match, Capture and get position of multiple patterns in the same string by richardwfrancis (Beadle) on Nov 13, 2009 at 02:08 UTC
Firstly, thanks to EVERYONE who has commented here. It's been a great help. I posted this before I went to bed and when I get to work this morning, you guys had solved all my problems! Wish it could be the same every day! I'll certainly check out the perlvar link as Fletch and moritz suggest. I always like a good gem! Kennethk, you know I tried a while loop cos I thought it should work but it kept hanging. I've since realised that this was down to doing something else in the loop that I didn't tell you guys about as it's a bit silly ;) . Basically I have the need to substitute the match within the string with a lowercase version of itself ie once the matches, positions etc are found the string will then go: From this: CATINTHEHATWITHABAT To this: catINTHEhatWITHAbat I was doing this simply as follows (adapted from the code moritz provided - thanks!): `while ($string =~ /($regex)/pg){ my $match = ${^MATCH}; my $start = $-[0]; my $end = $+[0]; my $hitpos = "$start-$end"; my $lcmatch = lc($match); $string =~ s/$match/$lcmatch/g; push @matches, "$match found at $hitpos "; }` [download] However the substitution line seems to cause it to hang and I can't get my head around why as it should just be operating on the current match of which there are only 3 in this instance Can anyone make any suggestions other than storing each match string in the loop and doing the substitution separately outside the loop. Again, thanks for the help Regards, Richard	[reply] [d/l]
Re^3: Match, Capture and get position of multiple patterns in the same string by moritz (Cardinal) on Nov 13, 2009 at 07:42 UTC
The regex match inside the `while` condition stores its current position in `pos($string)` (see pos for detail), which the substitution resets, so on the next iteration it starts from the beginning again. So if you insist on doing the matching and substitution in two different steps, you have to manually set `pos($string)` after the substitution: `use strict; use warnings; use 5.010; my $string = "CATINTHEHATWITHABAT"; my $regex = qr{\wAT}i; while ($string =~ m/($regex)/g){ my $match = $1; my $start = $-[0]; my $end = $+[0]; my $hitpos = "$start-$end"; my $lcmatch = lc($match); $string =~ s/$match/$lcmatch/g; # in the next iteration start over where we left off pos($string) = $end; say "$match found at $hitpos "; } say "string: $string";` [download] Since you put parenthesis around the regex, `${^MATCH}` can be replaced by the shorter `$1`, and there's no need for the /p modifier. Perl 6 - links to (nearly) everything that is Perl 6.	[reply] [d/l] [select]
Re^4: Match, Capture and get position of multiple patterns in the same string by richardwfrancis (Beadle) on Nov 13, 2009 at 08:33 UTC
Thanks moritz that makes perfect sense. I didn't realise the substitution would reset this. I've implemented that now, thank you. Do I need to distribute points or anything in this forum? Cheers, Rich	[reply]
Re^5: Match, Capture and get position of multiple patterns in the same string by moritz (Cardinal) on Nov 13, 2009 at 08:57 UTC
Re^3: Match, Capture and get position of multiple patterns in the same string by kennethk (Abbot) on Nov 13, 2009 at 16:03 UTC
If you want to substitute at the same time as the matching, the following may work for you: #!/usr/bin/perl use strict; my $string = "CATINTHEHATWITHABAT"; my $regex = '\wAT'; my @matches = (); while ($string =~ s/($regex)/lc($1)/e){ my $match = $1; my $length = length($&); my $pos = length($`); my $start = $pos + 1; my $end = $pos + $length; my $hitpos = "$start-$end"; push @matches, "$match found at $hitpos "; } print "$_\n" foreach @matches; print "$string\n"; [download] I've used the e modifier (see perlretut) to evaluate the lower-case transliteration of the matched string. Caveat: This becomes an infinite loop if you use the i modifier, since you will continuously overwrite the first occurrence of 'cat'. If need case insensitivity, perhaps you'd want `my $regex = '[A-Z][aA][tT]\|[a-z][aA]T\|[a-z]At';` or equivalent for your real case.	[reply] [d/l] [select]
Re^4: Match, Capture and get position of multiple patterns in the same string by richardwfrancis (Beadle) on Nov 16, 2009 at 06:56 UTC
Cool. Cheers!	[reply]