After benchmarking, you're the clear winner:

I personally believe that I'm not! ;)

Anyway, I extended the benchmark to include oha's solution (in my version) and ikegami's. For definiteness I made sure to compare subs that do exactly the same thing: accept a string and return a hashref of the counts, and that the counts are those of strings of length 2 or more repeated 2 or more times. I could not include lodin's solution because I haven't the slightest idea of how to modify it so that only strings of length 2 or more are taken into account. (Short of deleting thos of length 1.) Admittedly, it would be interesting to see how the benchmark goes with different data sets...

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; use constant MIN => 2; use Benchmark qw/:all :hireswallclock/; my $str='aabcdabcabcecdecd'; sub blazar { local $_=shift; my $l=length; my %count; for my $off (0..$l-1) { for my $len (MIN .. $l-$off) { my $s=substr $_, $off, $len; $count{ $s } ||= ()= /$s/g; } $count{$_} == 1 and delete $count{$_} for keys %count; } \%count; } sub oha { my $s=shift; my %saw; while($s =~ /(..+)(?=.*?\1)/g) { for my $x (0..length $1) { @saw{ map {substr $1, $x, $_} $x+2..length $1 } = (); } } $saw{$_} =()= $s =~ /\Q$_/g for keys %saw; \%saw; } { my %count; sub count1 { my( $string) = @_; my $length = length( $string ); if ($length < MIN) { $count{$_} == 1 and delete $count{$_} for keys %count; return \%count; } for my $l (MIN..$length) { my $s = substr( $string, 0, $l ); $count{ $s } += 1; } count1( substr( $string, 1 ) ); } sub kramba { count1 shift; for (keys %count) { delete $count{$_} unless $count{$_} >= 2 and length($_) >= MIN; } \%count; } } sub ikegami { my $str = shift; local our %counts; $str =~ / (.{2,}) # or (.+) (?(?{ !$counts{$1} })(?=.*\1)) (?{ ++$counts{$1} }) (?!) /x; \%counts; } cmpthese 10_000 => { blazar => sub { blazar $str }, oha => sub { oha $str }, kramba => sub { kramba $str }, ikegami => sub { ikegami $str }, }; __END__

Results:

Rate blazar kramba ikegami oha blazar 405/s -- -77% -89% -89% kramba 1788/s 341% -- -52% -52% ikegami 3743/s 823% 109% -- -0% oha 3743/s 823% 109% 0% --

Update: I included lodin's solution as per his direction as follows:

sub lodin { my $str = shift; local our %count; $str =~ / (.{2,}) (?(?{ $count{$1} }) (?!) ) .* \1 (?{ ($count{$1} ||= 1)++ }) (?!) /x; \%count; }

I also corrected the kramba subroutine as per lodin's remark:

sub kramba { my %count; my $count; $count = sub { my( $string) = @_; my $length = length( $string ); if ($length < MIN) { $count{$_} == 1 and delete $count{$_} for keys %count; return \%count; } for my $l (MIN..$length) { my $s = substr( $string, 0, $l ); $count{ $s } += 1; } $count->( substr( $string, 1 ) ); }; $count->(shift); for (keys %count) { delete $count{$_} unless $count{$_} >= 2 and length($_) >= MIN; } \%count; }

The new results are:

Rate blazar kramba oha ikegami lodin blazar 577/s -- -72% -90% -91% -91% kramba 2045/s 254% -- -66% -67% -69% oha 6039/s 946% 195% -- -2% -9% ikegami 6154/s 966% 201% 2% -- -8% lodin 6667/s 1055% 226% 10% 8% --

Update: lodin draws my attention by /msg on the fact that the closure leaks and he directs me to his own Sub::Recursive both for an explanation and a fix. He also notices that he didn't include the manual fix in the docs. Anyway, one can use local our $count; in this particular instance. I'm doing a new benchmark which will be posted in a separate node taking also this into account, although I don't think it will make that much of a difference.


In reply to Re^4: how to count the number of repeats in a string (really!) by blazar
in thread how to count the number of repeats in a string (really!) by blazar

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.