Re: Match, Capture and get position of multiple patterns in the same string
by moritz (Cardinal) on Nov 12, 2009 at 15:16 UTC
|
I highly recommend reading perlvar about all those regex match variables, it's full of "hidden" gems.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my $string = "CATINTHEHATWITHABAT";
my $regex = qr{\wAT}i;
my @matches = ();
while ($string =~ /($regex)/pg){
my $start = $-[0];
my $end = $+[0];
my $hitpos = "$start-$end";
say "${^MATCH} found at $hitpos";
}
Perl 6 - links to (nearly) everything that is Perl 6.
| [reply] [d/l] |
Re: Match, Capture and get position of multiple patterns in the same string
by Fletch (Bishop) on Nov 12, 2009 at 15:14 UTC
|
Probably @- and @+ (covered in perlvar) will get you what you want without your having to resort to computing them.
The cake is a lie.
The cake is a lie.
The cake is a lie.
| [reply] |
Re: Match, Capture and get position of multiple patterns in the same string
by kennethk (Abbot) on Nov 12, 2009 at 15:13 UTC
|
So very close. What you really mean to do is:
#!/usr/bin/perl
use strict;
my $string = "CATINTHEHATWITHABAT";
my $regex = '\wAT';
my @matches = ();
while ($string =~ /($regex)/gi){
my $match = $1;
my $length = length($&);
my $pos = length($`);
my $start = $pos + 1;
my $end = $pos + $length;
my $hitpos = "$start-$end";
push @matches, "$match found at $hitpos ";
}
print "$_\n" foreach @matches;
The difference is that a foreach loop builds the list before you start, whereas the while loop re-executes the expression each time. This means that you are clobbering $& and friends at the start of your foreach loop, but using a while loop means the values are fresh. | [reply] [d/l] [select] |
|
|
Firstly, thanks to EVERYONE who has commented here. It's been a great help. I posted this before I went to bed and when I get to work this morning, you guys had solved all my problems! Wish it could be the same every day!
I'll certainly check out the perlvar link as Fletch and moritz suggest. I always like a good gem!
Kennethk, you know I tried a while loop cos I thought it should work but it kept hanging. I've since realised that this was down to doing something else in the loop that I didn't tell you guys about as it's a bit silly ;) . Basically I have the need to substitute the match within the string with a lowercase version of itself ie once the matches, positions etc are found the string will then go:
From this: CATINTHEHATWITHABAT
To this: catINTHEhatWITHAbat
I was doing this simply as follows (adapted from the code moritz provided - thanks!):
while ($string =~ /($regex)/pg){
my $match = ${^MATCH};
my $start = $-[0];
my $end = $+[0];
my $hitpos = "$start-$end";
my $lcmatch = lc($match);
$string =~ s/$match/$lcmatch/g;
push @matches, "$match found at $hitpos ";
}
However the substitution line seems to cause it to hang and I can't get my head around why as it should just be operating on the current match of which there are only 3 in this instance
Can anyone make any suggestions other than storing each match string in the loop and doing the substitution separately outside the loop.
Again, thanks for the help
Regards, Richard | [reply] [d/l] |
|
|
use strict;
use warnings;
use 5.010;
my $string = "CATINTHEHATWITHABAT";
my $regex = qr{\wAT}i;
while ($string =~ m/($regex)/g){
my $match = $1;
my $start = $-[0];
my $end = $+[0];
my $hitpos = "$start-$end";
my $lcmatch = lc($match);
$string =~ s/$match/$lcmatch/g;
# in the next iteration start over where we left off
pos($string) = $end;
say "$match found at $hitpos ";
}
say "string: $string";
Since you put parenthesis around the regex, ${^MATCH} can be replaced by the shorter $1, and there's no need for the /p modifier.
Perl 6 - links to (nearly) everything that is Perl 6.
| [reply] [d/l] [select] |
|
|
|
|
|
|
If you want to substitute at the same time as the matching, the following may work for you:
#!/usr/bin/perl
use strict;
my $string = "CATINTHEHATWITHABAT";
my $regex = '\wAT';
my @matches = ();
while ($string =~ s/($regex)/lc($1)/e){
my $match = $1;
my $length = length($&);
my $pos = length($`);
my $start = $pos + 1;
my $end = $pos + $length;
my $hitpos = "$start-$end";
push @matches, "$match found at $hitpos ";
}
print "$_\n" foreach @matches;
print "$string\n";
I've used the e modifier (see perlretut) to evaluate the lower-case transliteration of the matched string.
Caveat: This becomes an infinite loop if you use the i modifier, since you will continuously overwrite the first occurrence of 'cat'. If need case insensitivity, perhaps you'd want my $regex = '[A-Z][aA][tT]|[a-z][aA]T|[a-z]At'; or equivalent for your real case. | [reply] [d/l] [select] |
|
|
Re: Match, Capture and get position of multiple patterns in the same string
by JavaFan (Canon) on Nov 12, 2009 at 16:25 UTC
|
Note that all solutions presented sofar will only return non-overlapping greedy matches, not all matches as the requirement is. Say for instance, you're searching for fofo in fofofo. I count two matches, but anything using while ("fofofo" =~ /fofo/g) will only find 1. And if the pattern is (?:fo)+ there are 6 potential matches (3 from position 0, 2 from position 2, 1 from position 4).
Perhaps you are only interested in non-overlapping greedy matches, but that wasn't clear to me.
$_ = "fofofo";
/((?:fo)+)(?{ say "[$-[1], $+[1], $1]" })(*FAIL)/;
__END__
[0, 6, fofofo]
[0, 4, fofo]
[0, 2, fo]
[2, 6, fofo]
[2, 4, fo]
[4, 6, fo]
| [reply] [d/l] [select] |
Re: Match, Capture and get position of multiple patterns in the same string
by 7stud (Deacon) on Nov 12, 2009 at 15:50 UTC
|
use strict;
use warnings;
use 5.010;
my $string = "CATATHAT";
my $regex = '\wAT';
say 0..9; #print ruler
say $string, "\n"; #print original string
while ($string =~ /$regex/gi) {
my $match_len = length($&);
my $start = length($`);
my $end = $start + $match_len - 1;
say "match: $& ", "start: $start ", "end: $end";
}
--output:--
0123456789
CATATHAT
match: CAT start: 0 end: 2
match: HAT start: 5 end: 7
| [reply] [d/l] |
|
|
Sorry, I refreshed the page after working on a solution, and it said 0 responses still.
| [reply] |
Re: Match, Capture and get position of multiple patterns in the same string
by Marshall (Canon) on Nov 12, 2009 at 16:03 UTC
|
I was a bit confused about the requirements, but I think this code does it.This code does something. I'm not sure if it does what you want.
#!/usr/bin/perl -w
use strict;
my @strings = ("CATINTHEHATWITHABAT",
"WERWERWAT134wAt");
my $regex = '\wAT';
foreach my $string (@strings)
{
while ( $string =~ m/($regex)/gi)
{
printf "%s found at pos %2d in %s\n",
$1, pos($string)-length($1)+1, $string;
}
}
__END__
Prints:
CAT found at pos 1 in CATINTHEHATWITHABAT
HAT found at pos 9 in CATINTHEHATWITHABAT
BAT found at pos 17 in CATINTHEHATWITHABAT
WAT found at pos 7 in WERWERWAT134wAt
wAt found at pos 13 in WERWERWAT134wAt
An update:
#!/usr/bin/perl -w
use strict;
my @strings = ("CATINTHEHATWITHABAT",
"WERWERWAT134wAtThatat",
"WERWERWAT134wAtThattat");
my $regex = '\wAT';
foreach my $string (@strings)
{
while ( $string =~ m/($regex)/gi)
{
printf "%s found at pos %2d-%-2d in %s\n",
$1, pos($string)-length($1)+1, pos($string),
$string;
}
}
__END__
Prints:
CAT found at pos 1-3 in CATINTHEHATWITHABAT
HAT found at pos 9-11 in CATINTHEHATWITHABAT
BAT found at pos 17-19 in CATINTHEHATWITHABAT
WAT found at pos 7-9 in WERWERWAT134wAtThatat
wAt found at pos 13-15 in WERWERWAT134wAtThatat
hat found at pos 17-19 in WERWERWAT134wAtThatat
WAT found at pos 7-9 in WERWERWAT134wAtThattat
wAt found at pos 13-15 in WERWERWAT134wAtThattat
hat found at pos 17-19 in WERWERWAT134wAtThattat
tat found at pos 20-22 in WERWERWAT134wAtThattat
Note: ranges do not "overlap", see "...Thattat"
| [reply] [d/l] [select] |
Re: Match, Capture and get position of multiple patterns in the same string
by johngg (Canon) on Nov 13, 2009 at 00:00 UTC
|
Using a look-ahead will allow for overlapping patterns. However, @+ contains the same values as corresponding elements of @- when look-aheads are used so sums are necessary.
$ perl -e '
> $str = q{CATINTHEHATWITHABATATINAAAT};
> $rex = qr{(?=([^A]A+T))};
> printf
> qq{Found %s, length %d at offset %d to %d\n},
> $1,
> length $1,
> $-[ 0 ],
> $-[ 0 ] + length( $1 ) - 1
> while $str =~ m{$rex}g;'
Found CAT, length 3 at offset 0 to 2
Found HAT, length 3 at offset 8 to 10
Found BAT, length 3 at offset 16 to 18
Found TAT, length 3 at offset 18 to 20
Found NAAAT, length 5 at offset 22 to 26
$
I hope this is of interest.
| [reply] [d/l] [select] |