comment on

After benchmarking, you're the clear winner:

Anyway, I extended the benchmark to include oha's solution (in my version) and ikegami's. For definiteness I made sure to compare subs that do exactly the same thing: accept a string and return a hashref of the counts, and that the counts are those of strings of length 2 or more repeated 2 or more times. I could not include lodin's solution because I haven't the slightest idea of how to modify it so that only strings of length 2 or more are taken into account. (Short of deleting thos of length 1.) Admittedly, it would be interesting to see how the benchmark goes with different data sets...

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;
use constant MIN => 2;
use Benchmark qw/:all :hireswallclock/;

my $str='aabcdabcabcecdecd';

sub blazar {
    local $_=shift;
    my $l=length;
    my %count;
    
    for my $off (0..$l-1) {
        for my $len (MIN .. $l-$off) {
            my $s=substr $_, $off, $len;
            $count{ $s } ||= ()= /$s/g;
        }
        $count{$_} == 1 and
          delete $count{$_} for keys %count;
    }
    \%count;
}

sub oha {
    my $s=shift;
    my %saw;

    while($s =~ /(..+)(?=.*?\1)/g) {
        for my $x (0..length $1) {
            @saw{ map {substr $1, $x, $_} $x+2..length $1 } = ();
        }
    }
    $saw{$_} =()= $s =~ /\Q$_/g for keys %saw;
    \%saw;
}

{
    my %count;
    sub count1 {
        my( $string) = @_;
        my $length = length( $string );
        
        if ($length < MIN) {
            $count{$_} == 1 and delete $count{$_} for keys %count;
            return \%count;
        }
        
        for my $l (MIN..$length) {
            my $s = substr( $string, 0, $l );
            $count{ $s } += 1;
        }
        count1( substr( $string, 1 ) );
    }

    sub kramba {
        count1 shift;
        for (keys %count) {
            delete $count{$_} unless
              $count{$_} >= 2 and length($_) >= MIN;
        }
        \%count;
    }
}

sub ikegami {
    my $str = shift;

    local our %counts;
    $str =~ /
      (.{2,})   # or (.+)
      (?(?{ !$counts{$1} })(?=.*\1))
      (?{ ++$counts{$1} })
      (?!)
      /x;
    \%counts;
}

cmpthese 10_000 => {
    blazar  => sub { blazar $str },
    oha     => sub { oha $str },
    kramba  => sub { kramba $str },
    ikegami => sub { ikegami $str },
};
    
__END__
[download]

Results:

          Rate  blazar  kramba ikegami     oha
blazar   405/s      --    -77%    -89%    -89%
kramba  1788/s    341%      --    -52%    -52%
ikegami 3743/s    823%    109%      --     -0%
oha     3743/s    823%    109%      0%      --
[download]

Update: I included lodin's solution as per his direction as follows:

sub lodin {
    my $str = shift;

    local our %count;
    $str =~ /
      (.{2,})
      (?(?{ $count{$1} })
          (?!)
      )
      .*
      \1
      (?{ ($count{$1} ||= 1)++ })
      (?!)
      /x;
    \%count;
}
[download]

I also corrected the kramba subroutine as per lodin's remark:

sub kramba {
    my %count;
    my $count;
    $count = sub {
        my( $string) = @_;
        my $length = length( $string );
        
        if ($length < MIN) {
            $count{$_} == 1 and delete $count{$_} for keys %count;
            return \%count;
        }
        
        for my $l (MIN..$length) {
            my $s = substr( $string, 0, $l );
            $count{ $s } += 1;
        }
        $count->( substr( $string, 1 ) );
    };
    $count->(shift);
    for (keys %count) {
        delete $count{$_} unless
          $count{$_} >= 2 and length($_) >= MIN;
    }
    \%count;
}
[download]

The new results are:

          Rate  blazar  kramba     oha ikegami   lodin
blazar   577/s      --    -72%    -90%    -91%    -91%
kramba  2045/s    254%      --    -66%    -67%    -69%
oha     6039/s    946%    195%      --     -2%     -9%
ikegami 6154/s    966%    201%      2%      --     -8%
lodin   6667/s   1055%    226%     10%      8%      --
[download]

Update: lodin draws my attention by /msg on the fact that the closure leaks and he directs me to his own Sub::Recursive both for an explanation and a fix. He also notices that he didn't include the manual fix in the docs. Anyway, one can use local our $count; in this particular instance. I'm doing a new benchmark which will be posted in a separate node taking also this into account, although I don't think it will make that much of a difference.

In reply to Re^4: how to count the number of repeats in a string (really!) by blazar
in thread how to count the number of repeats in a string (really!) by blazar

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.