Here's a method that doesn't require that you first determine what is being repeated:
use strict; use warnings; use diagnostics; our @found; # 0 1 2 3 4 5 # 012345678901234567890123456789012345678901234567890 my $string = q'abczdefzabcghijklzabczaerabrtyuabcethdauthabkudiabc'; # abc abc abc abc abc # 0 8 18 31 48 if( $string =~ m/ (.{3}) (?: .*? (\1) (?{ push @found, pos() - length($^N); }) )+ /x ) { print "$1: @found\n"; }
Note the placement of the (?{...}) code after the \1 backreference condition has been met. This makes it so that the (?{...}) code is only executed if the backreference condition first passes as true. That way you don't see all the backtracked dead ends, only the branches that actually worked out to match. Because we place the 'push' after the backreference has matched, you have to subtract the length of the submatch from 'pos' to find the starting position.
Also note that, in this case, .*? is preferable over .+?, because it is entirely possible that 'abcabc' would constitute a repeated substring, but such a case would be missed if we required there to exist a character between 'abc' and 'abc'.
PS: For the life of me, I can never remember how to use \G without first looking it up yet again. Fortunately for me, my solution doesn't require it.
Dave
In reply to Re: Finding a repeating substring with a regex
by davido
in thread Finding a repeating substring with a regex
by chargrill
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |