comment on

I've made just a few changes (mostly to my sub, so that it comes closer to my original design - your modifications added a few unnecessary steps) and run the benchmarks again.

The results seem to support my original suspicion that - at least for this particular problem - a regexp based solution would have to loose the fight against an approach that never has to look behind or ahead, but just touches every possible substring exactly 1 time:

Results for string:

"aabcdabcabcecdecd "

          Rate  blazar  kramba ikegami   lodin     oha
blazar   464/s      --    -77%    -89%    -90%    -90%
kramba  2062/s    344%      --    -52%    -54%    -58%
ikegami 4255/s    817%    106%      --     -4%    -13%
lodin   4444/s    858%    116%      4%      --     -9%
oha     4878/s    951%    137%     15%     10%      --

Results for string:

"aabcdabcabcecdecd aabcdabcabcecdecd "

          Rate  blazar     oha  kramba ikegami   lodin
blazar  92.4/s      --    -85%    -86%    -86%    -87%
oha      610/s    560%      --     -7%     -9%    -14%
kramba   658/s    612%      8%      --     -1%     -7%
ikegami  667/s    621%      9%      1%      --     -6%
lodin    709/s    667%     16%      8%      6%      --

Results for string:

"aabcdabcabcecdecd aabcdabcabcecdecd aabcdabcabcecdecd aabcdabcabcecde
+cd "

          Rate  blazar   lodin ikegami     oha  kramba
blazar  21.4/s      --    -85%    -86%    -86%    -90%
lodin    144/s    574%      --     -3%     -5%    -35%
ikegami  148/s    594%      3%      --     -2%    -33%
oha      151/s    607%      5%      2%      --    -31%
kramba   220/s    930%     53%     48%     46%      --
[download]

Here's the full code I've used for benchmarking.

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use constant MIN_LENGTH => 2;
use constant MIN_REPEATS => 2;
use Benchmark qw/:all :hireswallclock/;

my $str='aabcdabcabcecdecd';

sub blazar {
    local $_=shift;
    my $l=length;
    my %count;

    for my $off (0..$l-1) {
        for my $len (MIN_LENGTH .. $l-$off) {
            my $s = substr $_, $off, $len;
            $count{ $s } ||= ()= /$s/g;
        }
        $count{$_} < MIN_REPEATS and
          delete $count{$_} for keys %count;
    }
    \%count;
}

sub oha {
    my $s=shift;
    my %saw;

    while($s =~ /(..+)(?=.*?\1)/g) {
        for my $x (0..length $1) {
            @saw{ map {substr $1, $x, $_} $x+2..length $1 } = ();
        }
    }
    $saw{$_} =()= $s =~ /\Q$_/g for keys %saw;
    \%saw;
}

sub ikegami {
    my $str = shift;

    local our %counts;
    $str =~ /
      (.{2,})   # or (.+)
      (?(?{ !$counts{$1} })(?=.*\1))
      (?{ ++$counts{$1} })
      (?!)
      /x;
    \%counts;
}

sub lodin {
    my $str = shift;

    local our %count;
    $str =~ /
      (.{2,})
      (?(?{ $count{$1} })
          (?!)
      )
      .*
      \1
      (?{ ($count{$1} ||= 1)++ })
      (?!)
      /x;
    \%count;
}


{
    my %count;

    sub kramba {
        my( $string) = @_;
        my $length = length( $string );

        if ($length < MIN_LENGTH) {
            for (keys %count) {
                delete $count{$_}
                    if $count{$_} < MIN_REPEATS;
            }
            return \%count;
        }

        for my $l (MIN_LENGTH..$length) {
            my $s = substr( $string, 0, $l );
            $count{ $s } += 1;
        }

        kramba( substr( $string, 1 ) );
    };
}

for my $multiplier (1, 2, 4) {
   my $work_str = "$str " x $multiplier;

print "Results for string:\n\n\"$work_str\"\n\n";
    cmpthese 2000/$multiplier => {
        blazar  => sub { blazar $work_str },
        oha     => sub { oha $work_str },
        kramba  => sub { kramba $work_str },
        ikegami => sub { ikegami $work_str },
        lodin   => sub { lodin $work_str },
    }
}
[download]

Another mention I'd make is that if some changes would be needed to the subs - like for example considering at least MIN_REPEATS repetitions of a string to be counted - I'm afraid it might be rather challenging in modifying the RX-ish solutions.
Speaking for me, I wouldn't know how to make it in the code above, even if I think of me as not being a novice any more when dealing with regular expressions.

Update Ahmm... there are some things broken, and I'll have to find out which one. Checking results for simple cases looked ok, so I thought things are ok. But then trying to run the benchmark for the longer text that Oha proposed, I noticed some problems and so tried to just output the _count_ of strings retained by each sub for a string like

'aabcdabcabcecdecd aabcdabcabcecdecd aabcdabcabcecdecd aabcdabcabcecdecd '

Much to my surprize, that came out as

467,337,791,467,467

for respectively blazar, oha, kramba, ikegami, lodin. Ooops...

How was that: who has a clock, knows what the time is, who has 2 clocks, has a problem... :)

Update 2 With Oha's longer latin text, the counts are - in the same order as above - 419,244,371,371,371 and my little recursive beauty complains about 'Deep recursion on subroutine "main::kramba" at ./test.pl line 95'. Well, understandable...

Krambambuli
---
enjoying Mark Jason Dominus' Higher-Order Perl

In reply to Re^5: how to count the number of repeats in a string (really!) by Krambambuli
in thread how to count the number of repeats in a string (really!) by blazar

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.