This is a perfect task for the regex engine.
A more generalized version where you can specify the minimum substring length and minimum number of occurances islocal our %count; $str =~ / (.+) # or .{N,} where N is minimum length. (?(?{ $count{$1} }) (?!) ) .* \1 (?{ ($count{$1} ||= 1)++ }) (?!) /x;
my $min_len = 2; # Substring is at least two chars long. my $min_count = 3; # Substring occures at least three times. local our %count; use re 'eval'; $str =~ / (.{$min_len,}) (?(?{ $count{$1} }) (?!) ) (?> .*? \1 ){@{[ $min_count - 2 ]}} .* \1 (?{ ($count{$1} ||= $min_count-1)++ }) (?!) /x;
lodin
Update:
While writing this, ikegami posted a very similar-looking reply. While they look very much alike they work quite differently. ikegami's work by requiring that each match is repeated further into the string, and then goes on to count all those successive matches kind of like a global match. Mine does the counting right away, and then forces the engine to not count those again. So they work rather opposite of each other.
I did a shallow benchmark. It seems that mine is a slight favourite (5-10%) in many, but not all, situations. I also get the impression that ikegami's scales slightly better, see Re^5: how to count the number of repeats in a string (really!).
Update 2:
Added the comment in the regex.
Update 3:
Added the generalized version.
In reply to Re: how to count the number of repeats in a string (really!) [regexp solution]
by lodin
in thread how to count the number of repeats in a string (really!)
by blazar
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |