comment on

Your regex /(.{2,}).*\1/g will always try to capture the largest thing it can in $1. In your example string, every "b" character is followed by a "c". So every position where the string could match /b.*b/, it could also match /bc.*bc/. Since the "bc" version is longer, that's the one that will be tried first by the regex engine, and will return with success. It will never return success with $1 eq "b", even though a "b" character repeats itself in the string.

I personally believe that this obvious... now that you point it out... Anyway I now wonder if at this point the best thing could be to generate all substrings e.g. with two nested maps and a uniq-like technique and possibly filter out those that have a count of 1 if one is not interested in them. My approach at a filtering in the generation phase by means of a regex may be fixable somehow but I can't see an easy way...

Update: it's also worth noting that m//g does not mean "try to match every possible way this match could succeed". Instead it means, "try to find one match starting at each position in the string" .. So in the above, when it matches on "bc", it will not continue backtracking to pick up the match with "b". Instead, it will be satisfied that it found a match starting at that position, increment pos, and move on.

But in fact this is the reason why I explicitly set pos. Perl 6 provides an adverb to do so in the first place instead -matching with superimpositions-, which is very good.

Update: the following, for example, finally works really correctly.

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;
use constant MIN => 2;

my $str='aabcdabcabcecdecd';

sub count {
    local $_=shift;
    my $l=length;
    my %count;
    
    for my $off (0..$l-1) {
        for my $len (MIN .. $l-$off) {
            my $s=substr $_, $off, $len;
            $count{ $s } ||= ()= /$s/g;
        }
        $count{$_} == 1 and
          delete $count{$_} for keys %count;
    }
    \%count;
}

print Dumper count $str;

__END__
[download]

In reply to Re^2: how to count the number of repeats in a string (really!) by blazar
in thread how to count the number of repeats in a string (really!) by blazar

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.