Your regex /(.{2,}).*\1/g will always try to capture the largest thing it can in $1. In your example string, every "b" character is followed by a "c". So every position where the string could match /b.*b/, it could also match /bc.*bc/. Since the "bc" version is longer, that's the one that will be tried first by the regex engine, and will return with success. It will never return success with $1 eq "b", even though a "b" character repeats itself in the string.
I personally believe that this obvious... now that you point it out... Anyway I now wonder if at this point the best thing could be to generate all substrings e.g. with two nested maps and a uniq-like technique and possibly filter out those that have a count of 1 if one is not interested in them. My approach at a filtering in the generation phase by means of a regex may be fixable somehow but I can't see an easy way...
Update: it's also worth noting that m//g does not mean "try to match every possible way this match could succeed". Instead it means, "try to find one match starting at each position in the string" .. So in the above, when it matches on "bc", it will not continue backtracking to pick up the match with "b". Instead, it will be satisfied that it found a match starting at that position, increment pos, and move on.
But in fact this is the reason why I explicitly set pos. Perl 6 provides an adverb to do so in the first place instead -matching with superimpositions-, which is very good.
Update: the following, for example, finally works really correctly.
#!/usr/bin/perl use strict; use warnings; use Data::Dumper; use constant MIN => 2; my $str='aabcdabcabcecdecd'; sub count { local $_=shift; my $l=length; my %count; for my $off (0..$l-1) { for my $len (MIN .. $l-$off) { my $s=substr $_, $off, $len; $count{ $s } ||= ()= /$s/g; } $count{$_} == 1 and delete $count{$_} for keys %count; } \%count; } print Dumper count $str; __END__
In reply to Re^2: how to count the number of repeats in a string (really!)
by blazar
in thread how to count the number of repeats in a string (really!)
by blazar
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |