in reply to Re: An efficient way to gather a common portion of several strings' beginnings
in thread An efficient way to gather a common portion of several strings' beginnings

Lexical sort is n*log(n)*m (character comparisons) vs n*m of the basic loop. It is certainly not optimal for arbitrary inputs.

Update. Some benchmarking shows either of the two following versions ought to perform adequately, in practice. I'll leave it to the reader's discretion...

# however, these assume strings don't contain \0 sub lcp_v4 { my ($e, $s) = ("", @_); $e |= $_ ^ $s for @_; $e =~ m/^\0*/; substr($s, 0, $+[0]); } sub lcp_v5 { my ($a, $b) = (sort @_)[0,-1]; ($a ^ $b) =~ m/^\0*/; substr($a, 0, $+[0]); } use List::Util qw(minstr maxstr); sub lcp_v6 { my ($a, $b) = (&minstr, &maxstr); ($a ^ $b) =~ m/^\0*/; substr($a, 0, $+[0]); }

Update 2. Sometimes one is the most oblivious to the most obvious. Added the third version lcp_v6.

Replies are listed 'Best First'.
Re^3: An efficient way to gather a common portion of several strings' beginnings
by LanX (Saint) on Nov 15, 2015 at 22:02 UTC
    True Maybe true but

    A) simple is beautiful :)

    B) arbitrary input where performance matters ? This won't have any meaningful substrings other than ""

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

Re^3: An efficient way to gather a common portion of several strings' beginnings
by LanX (Saint) on Nov 17, 2015 at 20:57 UTC

    Maybe of interest:

    I tried to beat sort, where much of the result is useless, cause we only need the first and last element and nothing in between.

    The results are not too spectacular, though its requires less memory:

    use warnings; use strict; use Time::HiRes qw/time/; my $common = join "","a".."z"; my @x = map { $common . rand 1 } 1..1e6; my (@a,$a,$b,$start); print "\n--- with full sort\n"; @a=@x; $start=time(); my @b= sort @a; print $b[0],"\n",$b[-1],"\n"; print time -$start,"\n"; print "\n--- with triple sort\n"; @a=@x; $start=time(); $a= shift @a; $b= shift @a; ($a,undef,$b) = sort($a,$_,$b) for @a; print $a,"\n",$b,"\n"; print time -$start,"\n"; print "\n--- with assignment\n"; @a=@x; $start=time(); $a= shift @a; $b= shift @a; for my $x (@a) { $a = ($x,$x,$a)[$a cmp $x]; #next if $a eq $x; $b = ($x,$b,$x)[$b cmp $x]; } print $a,"\n",$b,"\n"; print time -$start,"\n"; print "\n--- with goto\n"; @a=@x; $start=time(); $a= shift @a; $b= shift @a; for my $x (@a) { goto ("NEXT", "NEWMIN", "MAYBEMAX")[$a cmp $x]; NEWMIN: $a=$x; next; MAYBEMAX: goto ("NEXT", "NEXT", "NEWMAX" )[$b cmp $x]; NEWMAX: $b=$x; NEXT: } print $a,"\n",$b,"\n"; print time -$start,"\n";

    output:

    --- with full sort abcdefghijklmnopqrstuvwxyz0.000100396248079448 abcdefghijklmnopqrstuvwxyz9.99150346814304e-06 7.66956496238708 --- with triple sort abcdefghijklmnopqrstuvwxyz0.000100396248079448 abcdefghijklmnopqrstuvwxyz9.99150346814304e-06 5.88848209381104 --- with assignment abcdefghijklmnopqrstuvwxyz0.000100396248079448 abcdefghijklmnopqrstuvwxyz9.99150346814304e-06 1.2906858921051 --- with goto abcdefghijklmnopqrstuvwxyz0.000100396248079448 abcdefghijklmnopqrstuvwxyz9.99150346814304e-06 2.99558115005493

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

    update

    either I'm working too hard or Alzheimer is knocking at the door.

    I completely forgot about le and ge and spend too much effort into emulation

    print "\n--- with le/ge\n"; @a=@x; $start=time(); $a= shift @a; $b= shift @a; for my $x (@a) { $a = $x if $a ge $x; $b = $x if $b le $x; } print $a,"\n",$b,"\n"; print time -$start,"\n"; __END__ --- with le/ge abcdefghijklmnopqrstuvwxyz0.000100420329498974 abcdefghijklmnopqrstuvwxyz9.95282924378671e-05 0.773550033569336
Re^3: An efficient way to gather a common portion of several strings' beginnings
by LanX (Saint) on Nov 17, 2015 at 00:36 UTC
    > Update. Some benchmarking ...

    your version is of course faster

    DB<129> push @a, join "",a..z,rand(1) for 1..1e6 => "" DB<130> $start=time;print lcp_v4(@a);time -$start => 1.25800704956055 abcdefghijklmnopqrstuvwxyz DB<131> $start=time;print lcp_v5(@a);time -$start => 6.93630909919739 abcdefghijklmnopqrstuvwxyz

    > # however, these assume strings don't end with \0

    See thats the benefit of a simple algorithm, sort doesn't care about \0, one could easily replace the comparison of 2 lines with something stable without notable impact.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

Re^3: An efficient way to gather a common portion of several strings' beginnings
by LanX (Saint) on Nov 17, 2015 at 22:48 UTC
    > Update 2. Sometimes one is the most oblivious to the most obvious. Added the third version lcp_v6.

    I could call using modules cheating, but

    a) it's core

    b) you have to admit that hippo's sort approach is sexy!

    --- with le/ge abcdefghijklmnopqrstuvwxyz0.000100071512918021 abcdefghijklmnopqrstuvwxyz9.83499014282074e-05 0.786839962005615 --- with minstr/maxstr abcdefghijklmnopqrstuvwxyz0.000100071512918021 abcdefghijklmnopqrstuvwxyz9.83499014282074e-05 0.23346209526062

    Though your final comparison fails with some edge cases :-P

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!