comment on

Actually, there's a much simpler solution (codewise):

sub findCycle {
  my $str = @_ ? shift : '';
  $str =~ /^(.*?)\1*$/;
  return wantarray ? ( length $1, $1 ) : length $1;
}
[download]

Though I suspect that, for long strings, a solution like the one you proposed but testing only cycles whose lengths are divisors of the input length would be faster. I'll do some benchmarking when I get a chance.

Update:

Caveat: take my benchmarks with a barrel of salt; despite many burns, I still manage to botch benchmarks routinely.

OK, one could test many possible variants. Below I test the case of a string of length about 10_000, consisting of a repeating pattern whose length is given as an input to the benchmarking script. I test three solutions: (0) the original algorithm (plus the minor optimization of not testing cycles longer than half the original string); (1) the modification of the original that tests only cycles whose lengths are divisors of the input string's slength; (2) the regex solution above. For a cycle length of 101 (in a string of length 9999), the results are:

    Rate    0    1    2
0  906/s   -- -58% -85%
1 2143/s 137%   -- -64%
2 5966/s 559% 178%   --
[download]

So, contrary to my expectation, the regex solution wins handily. On the other hand, for a cycle-free string of comparable length (9973), solution 1 is the clear winner:

    Rate      0      2      1
0 17.0/s     --   -89%   -99%
2  155/s   812%     --   -94%
1 2465/s 14431%  1494%     --
[download]

And for short cycles in a long string, your original solution cleans house:

     Rate     2     1     0
2   222/s    --  -96%  -98%
1  5023/s 2161%    --  -52%
0 10479/s 4615%  109%    --
[download]

That's for a cycle of length 3 in a string of length 9999.

Full code below:

use strict;
use warnings;
use Math::Pari 'divisors';
use Benchmark 'cmpthese';

my $len = shift || 3;

srand 0;

{
  $::input = join '', map int( rand 9 ), 1..$len;
  last if length $::input == findCycle_0( $::input );
  redo;
}

$::input x= int( 10_000 / $len );

cmpthese( -1, {
               0 => 'scalar findCycle_0( $::input )',
               1 => 'scalar findCycle_1( $::input )',
               2 => 'scalar findCycle_2( $::input )',
              } );

sub findCycle_0 {
  my $str = shift;
  my $copy = $str;
  my $strLen = length $str;
  for ( 1 .. $strLen/2 ) {
    $copy .= substr $copy, 0, 1, '';
    return wantarray  ? ($_, substr $str, 0, $_)
      : $_ if $str eq $copy;
  }
  return wantarray ? ($strLen, $str) : $strLen;
}

sub findCycle_1 {
  my $str = shift;
  my $strLen = length $str;
  for ( @{ divisors( $strLen ) } ) {
    my $copy = $str;
    $copy .= substr( $copy, 0, $_, '' );
    return wantarray  ? ($_, substr $str, 0, $_)
      : $_ if $str eq $copy;
  }
}

sub findCycle_2 {
  my $str = @_ ? shift : '';
  $str =~ /^(.*?)\1*$/;
  return wantarray ? ( length $1, $1 ) : length $1;
}
[download]

the lowliest monk

In reply to Re: Find repeated patterns in strings by tlm
in thread Find repeated patterns in strings by GrandFather

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.