Re: Finding repeat sequences. (Results:Part 1)

Below are the results from my first pass -- verifying basic functionality -- of (my adaptions of) the 8 solutions from tye, choroba, DamianConway, tobyink, AnomalousMonk, hdb, Eily, sundialsvc4 in this thread:

C:\test>1039630-b.pl14 -SHOW=0
Looking for 'fredfre' in 'fredfrefredfr'

       tye found 'fredfrefred'; excluded from further consideration


      svc4 found 'redfrefredf'; excluded from further consideration

Looking for 'fredfre' in 'fredfrefredf'
Looking for 'fredfre' in 'fredfrefred'

       hdb found 'none found'; excluded from further consideration

Looking for 'fredfre' in 'fredfrefre'
Looking for 'fredfre' in 'fredfrefr'
Looking for 'fredfre' in 'fredfref'
Looking for 'fredfre' in 'fredfre'
Looking for 'fredfre' in 'fredfrefredfrefredfr'

   choroba found 'fredfrefredfre'; excluded from further consideration

Looking for 'fredfre' in 'fredfrefredfrefredf'
Looking for 'fredfre' in 'fredfrefredfrefred'
Looking for 'fredfre' in 'fredfrefredfrefre'
Looking for 'fredfre' in 'fredfrefredfrefr'
Looking for 'fredfre' in 'fredfrefredfref'
Looking for 'fredfre' in 'fredfrefredfre'

      Eily found 'fredfrefred'; excluded from further consideration

Partisipants in performance tests: anomalous tobyink damianc
[download]

If authors want to correct (my adaptions of) their solutions that's great, (but please don't moan at me If I screwed the pouch adapting them to subroutines :).

Here's the test harness:

#! perl -slw
use strict;
use 5.014;
use Time::HiRes qw[ time ];

my %tests = (
    tye => sub {
        state $re = qr[^((.*?).*?)(?=.)\1*\2$];
        my $r = shift;
        $$r =~ $re and return $1;
        return;
    },
    choroba => sub {
        state $re = qr[^((.*).*)\1*\2$];
        my $r = shift;
        $$r =~ $re and return $1;
        return;
    },
    damianc => sub {
        state $re = qr[
            ^(.+?)(?{$^N}) ## $1 as $^R
            (?| \1+$()     ## Exact rep, no $2
              | \1*(.+)$   ## Partial rep as $2
                  ## Check its a proper prefix
                  (??{ $^N eq substr( $^R, 0, length($^N)) ? '' : '(?!
+)' })
            )
        ]x;
        my $r = shift;
        $$r =~ $re and return $1;
        return;
    },
    tobyink => sub {
        my $input  = shift;
        my $length = length $$input;

        for my $i ( 1 .. $length ) {
            my $possible = substr( $$input, 0, $i );
            my $repeated = $possible x ( 1 + int( $length / $i ) );
            return $possible if $$input eq substr( $repeated, 0, $leng
+th );
        }
        return;
    },
    hdb => sub {
        my $input = shift;
        my $length = length $$input;
        my $i = 0;
        my $possible;
        while( 1 ) {
            $possible = substr $$input, 0, $i+1; # increase length by 
+1
            $i = index $$input, $possible, $i+1; # find next occurence
+ of candidate
            return if $i < 0;            # if not found return full st
+ring => no repetition
            $possible = substr $$input, 0, $i;   # this is the minimum
+ length candidate
            return $possible if $$input eq substr( $possible x ( 1 + i
+nt( $length / $i ) ), 0, $length ); # success
        }
    },
    Eily => sub {
        my $input = shift;
        local $_ = reverse $$input;
        /^(.*)(.+?\1)\2*$/ and return reverse $2;
        return;
    },
    anomalous => sub {
        state $re = qr[ \A (.+?) \1* (.*) (?(?{ 0 != index $1, $2 }) (
+*FAIL)) \z ]xms;
        my $r = shift;
        $$r =~ $re and return $1;
        return
    },
    svc4 => sub {
        my $search = shift;
        my $tail_length = 1;
        my $tail_step   = int( length( $$search ) / 2 );

        while ($tail_step > 0) {
            $tail_length += $tail_step
                while substr( $$search, 0, $tail_length + $tail_step )
                eq    substr( $$search, -( $tail_length + $tail_step )
+, ( $tail_length + $tail_step ) )
            ;
            $tail_step = int( $tail_step / 2 );
        }

        my $body_length = length( $$search ) - $tail_length;
        my $longest = $body_length;
        my $n = $body_length - 1;
        while( $n > 1 ) {
            if( ( $body_length % $n ) == 0 ) {
                if( substr( $$search, 0, $n ) eq substr( $$search, $n,
+ $n ) ) {
                    $longest = $n;
                    last;
                }
            }
            $n--;
        }
        return substr( $$search, 1, $longest );
    },
);

our $SHOW //= 1;

## basic functionality
my $base = 'fredfre';
for my $rep ( 2 .. 3 ) {
    my $full = $base x $rep;
    for my $x ( 1 .. length( $base ) ) {
        my $str = substr( $full, 0, -$x ); ## make partial
        print "Looking for '$base' in '$str'";
        for my $test ( keys %tests ) {
            my $res;
            eval {
                local $SIG{ALRM} = sub { die "timeout" };
                alarm 10;
                $res = $tests{ $test }->( \$str ) // 'none found';
                alarm 0;
            };
            $SHOW and printf "%10s -> %s\n", $test, $res // $@;
            if( $res ne $base and not $base eq $str
            or $@ eq 'timeout'
            ) {
                delete $tests{ $test };
                printf "\n%10s found '%s'; excluded from further consi
+deration\n\n",
                    $test, $res // $@
                ;
            }
        }
    }
}
print "Partisipants in performance tests: @{[ keys %tests ]}";
exit;
[download]

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re: Finding repeat sequences. (Results:Part 1) Select or Download Code

Replies are listed 'Best First'.
Re^2: Finding repeat sequences. (Results:Part 2. The winner) by BrowserUk (Patriarch) on Jun 19, 2013 at 18:06 UTC
After exclusions on functionality, the three left standing -- DamianConway, tobyink & AnomalousMonk -- went forward to performance testing where the latter quickly fell by the wayside. Of the remaining two, tobyink's solution is hands down winner with a cumulative 66 seconds version DamianConway's 1670 seconds: ... my @bases = ( 'fred', join( '', 'a'..'z' ), unpack( 'b', pack 'Q', 0 .. 99 ), ); my %res; for my $base ( @bases ) { for my $reps ( 1, 10, 100, 1000 ) { my $str = $base x $reps . substr( $base, 0, rand( length $base + ) ); for my $test ( keys %tests ) { my $start = time; my $res; eval { local $SIG{ALRM} = sub { die "timeout\n" }; alarm $T * $reps; $res = $tests{ $test }->( \$str ) // 'none found'; alarm 0; }; if( $@ eq "timeout\n" ) { delete $tests{ $test }; warn "$test timed out [@{[ $T * $reps ]}]; excluded\n" +; next; } my $stop = time; warn "$test: Found '$res' instead of '$base'\n" unless $re +s eq $base; $res{ $test }{ length( $base ) }{ $reps } = $stop - $start +; $res{ $test }{ length( $base ) }{ all } += $stop - $start; $res{ $test }{ all } += $stop - $start; printf "b:%5u in s:%10u %10s :: %f s\n", length( $base ), length( $str ), $test, $stop - $start ; } } } pp \%res; __END__ [18:23:33.12] C:\test>1039630-b.pl14 -SHOW=0 -T=10 tye found 'fredfrefred'; excluded from further consideration svc4 found 'redfrefredf'; excluded from further consideration hdb found 'none found'; excluded from further consideration choroba found 'fredfrefredfre'; excluded from further consideration Eily found 'fredfrefred'; excluded from further consideration Partisipants in performance tests: anomalous tobyink damianc anomalous timed out [10]; excluded { anomalous => { 4 => { 1 => "0.000113964080810547", 10 => "0.000288963317871094 +", 100 => "0.00277900695800781", 1000 => "0.0275580883026123", all => "0.0307400226593018", }, 26 => { 1 => "0.0011751651763916", 10 => "0.00826501846313477", +100 => "0.118575096130371", 1000 => "4.02753186225891", all => "4.15554714202881", }, all => "4.18628716468811", }, damianc => { 4 => { 1 => "0.000234842300415039", 10 => "0.000370979309082031", + 100 => "0.000273942947387695", 1000 => "0.000956058502197266", all => "0.00183582305908203", }, 26 => { 1 => "0.000357866287231445", 10 => "0.000438928604125977" +, 100 => "0.00239300727844238", 1000 => "0.0216219425201416", all => "0.0248117446899414", }, 6400 => { 1 => "1.69514513015747", 10 => "12.9105410575867", 100 +=> "135.586049079895", 1000 => "1520.64571499825", all => "1670.83745026588", }, all => "1670.86409783363", }, tobyink => { 4 => { 1 => "0.000294923782348633", 10 => "0.000179052352905273", + 100 => "5.60283660888672e-005", 1000 => "5.10215759277344e-005", all => "0.000581026077270508", }, 26 => { 1 => "7.60555267333984e-005", 10 => "0.000123977661132813 +", 100 => "0.00015711784362793", 1000 => "0.000526189804077148", all => "0.000883340835571289", }, 6400 => { 1 => "0.071134090423584", 10 => "0.232213973999023", 10 +0 => "4.46547484397888", 1000 => "61.8524870872498", all => "66.6213099956512", }, all => "66.6227743625641", }, } [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^3: Finding repeat sequences. (Results:Part 2. The winner) by hdb (Monsignor) on Jun 19, 2013 at 19:07 UTC
Congrats to tobyink! Thanks to your efforts, BrowserUK, I realized my mistakes in code and thinking. Should you ever get around to re-run your tests, here is a corrected version, don't know whether it is faster then tobyink's. `hdb => sub { my $input = shift; my $length = length $$input; my $i = 0; my $possible; my $j; while( 1 ) { $possible = substr $$input, 0, ++$i; $possible = substr $$input, 0, $i=$j if( ($j = index $$input, $p +ossible, $i) > 0 ); return $possible if substr( $$input, $i ) eq substr($$input, 0, +$length - $i); } },` [download]	[reply] [d/l]
Re^4: Finding repeat sequences. (Results:Part 3. The new winner) by BrowserUk (Patriarch) on Jun 19, 2013 at 19:46 UTC
Update: (I sent this to hdb as a /msg; and then decided it belongs in here): That is blinding! I've been looking for a solution that avoided being O(length string). Now I can stop looking. Thankyou! Nice one++ Thank you :) (0.228s -v- 70.6s) Partisipants in performance tests: hdb tobyink b: 4 in s: 7 hdb :: 0.000154 s b: 4 in s: 7 tobyink :: 0.000169 s b: 4 in s: 43 hdb :: 0.000098 s b: 4 in s: 43 tobyink :: 0.000333 s b: 4 in s: 402 hdb :: 0.000156 s b: 4 in s: 402 tobyink :: 0.000190 s b: 4 in s: 4003 hdb :: 0.000250 s b: 4 in s: 4003 tobyink :: 0.000233 s b: 26 in s: 44 hdb :: 0.000126 s b: 26 in s: 44 tobyink :: 0.000104 s b: 26 in s: 274 hdb :: 0.000040 s b: 26 in s: 274 tobyink :: 0.000106 s b: 26 in s: 2614 hdb :: 0.000031 s b: 26 in s: 2614 tobyink :: 0.000086 s b: 26 in s: 26008 hdb :: 0.000190 s b: 26 in s: 26008 tobyink :: 0.000492 s b: 6400 in s: 8044 hdb :: 0.000483 s b: 6400 in s: 8044 tobyink :: 0.052703 s b: 6400 in s: 69681 hdb :: 0.001241 s b: 6400 in s: 69681 tobyink :: 0.240158 s b: 6400 in s: 641930 hdb :: 0.017612 s b: 6400 in s: 641930 tobyink :: 4.962130 s b: 6400 in s: 6404352 hdb :: 0.208475 s b: 6400 in s: 6404352 tobyink :: 65.351273 s { hdb => { 26 => { 1 => "0.000125885009765625", 10 => "4.00543212890625e-005", 100 => "3.09944152832031e-005", 1000 => "0.000190019607543945", all => "0.000386953353881836", }, 4 => { 1 => "0.000154018402099609", 10 => "9.79900360107422e-005", 100 => "0.000156164169311523", 1000 => "0.000249862670898438", all => "0.000658035278320313", }, 6400 => { 1 => "0.000483036041259766", 10 => "0.00124096870422363", 100 => "0.0176119804382324", 1000 => "0.208475112915039", all => "0.227811098098755", }, all => "0.228856086730957", }, tobyink => { 26 => { 1 => "0.000103950500488281", 10 => "0.000106096267700195", 100 => "8.60691070556641e-005", 1000 => "0.000491857528686523", all => "0.000787973403930664", }, 4 => { 1 => "0.000169038772583008", 10 => "0.000333070755004883", 100 => "0.000190019607543945", 1000 => "0.000233173370361328", all => "0.000925302505493164", }, 6400 => { 1 => "0.0527029037475586", 10 => "0.240158081054688", 100 => "4.96213006973267", 1000 => "65.3512728214264", all => "70.6062638759613", }, all => "70.6079771518707", }, } [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^5: Finding repeat sequences. (Results:Part 3. The new winner) by hdb (Monsignor) on Jun 19, 2013 at 20:03 UTC
Re^2: Finding repeat sequences. (definitions; part $n) by tye (Sage) on Jun 20, 2013 at 00:40 UTC
Looking for 'fredfre' in 'fredfrefredfr' tye found 'fredfrefred'; excluded from further consideration Looks like you changed your definition of your desired results, so it was silly to use a "solution" tailored to a different definition. This particular example matches my original speculation of what made sense, which means you might want to try my original solution. Though, I didn't wade through everything trying to find the various redefinitions of what was desired (besides the ones given in reply to my nodes) and then sort them chronologically in order to figure out what the new requirements actually are. - tye	[reply]
Re^3: Finding repeat sequences. (definitions; part $n) by BrowserUk (Patriarch) on Jun 20, 2013 at 01:11 UTC
There is no new requirement. Re-read the OP. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]