in reply to Find repeated patterns in strings
Actually, there's a much simpler solution (codewise):
Though I suspect that, for long strings, a solution like the one you proposed but testing only cycles whose lengths are divisors of the input length would be faster. I'll do some benchmarking when I get a chance.sub findCycle { my $str = @_ ? shift : ''; $str =~ /^(.*?)\1*$/; return wantarray ? ( length $1, $1 ) : length $1; }
Update:
Caveat: take my benchmarks with a barrel of salt; despite many burns, I still manage to botch benchmarks routinely.
OK, one could test many possible variants. Below I test the case of a string of length about 10_000, consisting of a repeating pattern whose length is given as an input to the benchmarking script. I test three solutions: (0) the original algorithm (plus the minor optimization of not testing cycles longer than half the original string); (1) the modification of the original that tests only cycles whose lengths are divisors of the input string's slength; (2) the regex solution above. For a cycle length of 101 (in a string of length 9999), the results are:
So, contrary to my expectation, the regex solution wins handily. On the other hand, for a cycle-free string of comparable length (9973), solution 1 is the clear winner:Rate 0 1 2 0 906/s -- -58% -85% 1 2143/s 137% -- -64% 2 5966/s 559% 178% --
Rate 0 2 1 0 17.0/s -- -89% -99% 2 155/s 812% -- -94% 1 2465/s 14431% 1494% --
And for short cycles in a long string, your original solution cleans house:
That's for a cycle of length 3 in a string of length 9999.Rate 2 1 0 2 222/s -- -96% -98% 1 5023/s 2161% -- -52% 0 10479/s 4615% 109% --
Full code below:
use strict; use warnings; use Math::Pari 'divisors'; use Benchmark 'cmpthese'; my $len = shift || 3; srand 0; { $::input = join '', map int( rand 9 ), 1..$len; last if length $::input == findCycle_0( $::input ); redo; } $::input x= int( 10_000 / $len ); cmpthese( -1, { 0 => 'scalar findCycle_0( $::input )', 1 => 'scalar findCycle_1( $::input )', 2 => 'scalar findCycle_2( $::input )', } ); sub findCycle_0 { my $str = shift; my $copy = $str; my $strLen = length $str; for ( 1 .. $strLen/2 ) { $copy .= substr $copy, 0, 1, ''; return wantarray ? ($_, substr $str, 0, $_) : $_ if $str eq $copy; } return wantarray ? ($strLen, $str) : $strLen; } sub findCycle_1 { my $str = shift; my $strLen = length $str; for ( @{ divisors( $strLen ) } ) { my $copy = $str; $copy .= substr( $copy, 0, $_, '' ); return wantarray ? ($_, substr $str, 0, $_) : $_ if $str eq $copy; } } sub findCycle_2 { my $str = @_ ? shift : ''; $str =~ /^(.*?)\1*$/; return wantarray ? ( length $1, $1 ) : length $1; }
the lowliest monk
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Find repeated patterns in strings
by GrandFather (Saint) on Aug 27, 2005 at 20:58 UTC | |
|
Re^2: Find repeated patterns in strings
by bioMan (Beadle) on Oct 04, 2005 at 17:16 UTC | |
by QM (Parson) on Oct 04, 2005 at 18:11 UTC |