comment on

Here's a method that doesn't require that you first determine what is being repeated:

use strict;
use warnings;
use diagnostics;

our @found;

#              0         1         2         3         4         5
#              012345678901234567890123456789012345678901234567890
my $string = q'abczdefzabcghijklzabczaerabrtyuabcethdauthabkudiabc';
#              abc     abc       abc          abc              abc
#              0       8         18           31               48

if( 
  $string =~ m/
                (.{3})
                (?:
                  .*?
                  (\1)
                  (?{ push @found, pos() - length($^N); })
                )+
             /x
) {
  print "$1: @found\n";
}
[download]

Note the placement of the (?{...}) code after the \1 backreference condition has been met. This makes it so that the (?{...}) code is only executed if the backreference condition first passes as true. That way you don't see all the backtracked dead ends, only the branches that actually worked out to match. Because we place the 'push' after the backreference has matched, you have to subtract the length of the submatch from 'pos' to find the starting position.

Also note that, in this case, .*? is preferable over .+?, because it is entirely possible that 'abcabc' would constitute a repeated substring, but such a case would be missed if we required there to exist a character between 'abc' and 'abc'.

PS: For the life of me, I can never remember how to use \G without first looking it up yet again. Fortunately for me, my solution doesn't require it.

Dave

In reply to Re: Finding a repeating substring with a regex by davido
in thread Finding a repeating substring with a regex by chargrill

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.