Inspired by today's how to count the no of repeats in a string, I came up with an admittedly artificial but possibly very interesting problem: that of counting for a string the repetition count of each substring that is actually repeated. For example, given the string
my $str = 'aabcdabcabcecdecd';
I see that -limiting ourselves to substrings of length 2 or more- we have the following counts:
(abc => 3, bc => 3, cd => 3, ecd => 2)
One possible way to get those counts is as follows -I'm casting the thing in the form of a sub-:
sub count1 { local $_=shift; my %count; for my $pos (0..(length)-1) { pos=$pos; @count{ /(.{2,}).*\1/g } = (); } for my $k (keys %count) { $count{$k} =()= /$k/g; } \%count; }
In fact, print Dumper count1 $str; gives me:
$VAR1 = { 'cd' => 3, 'ecd' => 2, 'abc' => 3, 'bc' => 3 };
To my unpleasant surprise, if I change .{2,} to .+ I do not get all substrings (of length 1):
$VAR1 = { 'cd' => 3, 'c' => 5, 'ecd' => 2, 'a' => 4, 'abc' => 3, 'd' => 3, 'bc' => 3 };
And I can't understand why...
I personally believe that map based solutions are often very cute and so I rolled my own too:
sub count2 { my $s=shift; my %saw; my %count = map { map { $saw{$_}++ ? () : $_ => scalar(()= $s =~ /$_/g); } do { pos=$_; $s =~ /(.{2,}).*\1/g }; } 0..length($s)-1; \%count; }
But the latter
$VAR1 = { 'ecd' => 2, 'abc' => 3, '3' => 2 };
And I can't understand why, either.
What's your take on the problem?
In reply to how to count the number of repeats in a string (really!) by blazar
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |