Re^2: An efficient way to gather a common portion of several strings' beginnings

Replies are listed 'Best First'.
Re^3: An efficient way to gather a common portion of several strings' beginnings by LanX (Saint) on Nov 15, 2015 at 22:02 UTC
~~True~~ Maybe true but A) simple is beautiful :) B) arbitrary input where performance matters ? This won't have any meaningful substrings other than "" Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply]
Re^3: An efficient way to gather a common portion of several strings' beginnings by LanX (Saint) on Nov 17, 2015 at 20:57 UTC
Maybe of interest: I tried to beat sort, where much of the result is useless, cause we only need the first and last element and nothing in between. The results are not too spectacular, though its requires less memory: use warnings; use strict; use Time::HiRes qw/time/; my $common = join "","a".."z"; my @x = map { $common . rand 1 } 1..1e6; my (@a,$a,$b,$start); print "\n--- with full sort\n"; @a=@x; $start=time(); my @b= sort @a; print $b[0],"\n",$b[-1],"\n"; print time -$start,"\n"; print "\n--- with triple sort\n"; @a=@x; $start=time(); $a= shift @a; $b= shift @a; ($a,undef,$b) = sort($a,$_,$b) for @a; print $a,"\n",$b,"\n"; print time -$start,"\n"; print "\n--- with assignment\n"; @a=@x; $start=time(); $a= shift @a; $b= shift @a; for my $x (@a) { $a = ($x,$x,$a)[$a cmp $x]; #next if $a eq $x; $b = ($x,$b,$x)[$b cmp $x]; } print $a,"\n",$b,"\n"; print time -$start,"\n"; print "\n--- with goto\n"; @a=@x; $start=time(); $a= shift @a; $b= shift @a; for my $x (@a) { goto ("NEXT", "NEWMIN", "MAYBEMAX")[$a cmp $x]; NEWMIN: $a=$x; next; MAYBEMAX: goto ("NEXT", "NEXT", "NEWMAX" )[$b cmp $x]; NEWMAX: $b=$x; NEXT: } print $a,"\n",$b,"\n"; print time -$start,"\n"; [download] output: --- with full sort abcdefghijklmnopqrstuvwxyz0.000100396248079448 abcdefghijklmnopqrstuvwxyz9.99150346814304e-06 7.66956496238708 --- with triple sort abcdefghijklmnopqrstuvwxyz0.000100396248079448 abcdefghijklmnopqrstuvwxyz9.99150346814304e-06 5.88848209381104 --- with assignment abcdefghijklmnopqrstuvwxyz0.000100396248079448 abcdefghijklmnopqrstuvwxyz9.99150346814304e-06 1.2906858921051 --- with goto abcdefghijklmnopqrstuvwxyz0.000100396248079448 abcdefghijklmnopqrstuvwxyz9.99150346814304e-06 2.99558115005493 [download] Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!} update either I'm working too hard or Alzheimer is knocking at the door. I completely forgot about `le` and `ge` and spend too much effort into emulation `print "\n--- with le/ge\n"; @a=@x; $start=time(); $a= shift @a; $b= shift @a; for my $x (@a) { $a = $x if $a ge $x; $b = $x if $b le $x; } print $a,"\n",$b,"\n"; print time -$start,"\n"; __END__ --- with le/ge abcdefghijklmnopqrstuvwxyz0.000100420329498974 abcdefghijklmnopqrstuvwxyz9.95282924378671e-05 0.773550033569336` [download]	[reply] [d/l] [select]
Re^3: An efficient way to gather a common portion of several strings' beginnings by LanX (Saint) on Nov 17, 2015 at 00:36 UTC
> Update. Some benchmarking ... your version is of course faster `DB<129> push @a, join "",a..z,rand(1) for 1..1e6 => "" DB<130> $start=time;print lcp_v4(@a);time -$start => 1.25800704956055 abcdefghijklmnopqrstuvwxyz DB<131> $start=time;print lcp_v5(@a);time -$start => 6.93630909919739 abcdefghijklmnopqrstuvwxyz` [download] > # however, these assume strings don't end with \0 See thats the benefit of a simple algorithm, sort doesn't care about \0, one could easily replace the comparison of 2 lines with something stable without notable impact. Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply] [d/l]
Re^3: An efficient way to gather a common portion of several strings' beginnings by LanX (Saint) on Nov 17, 2015 at 22:48 UTC
> Update 2. Sometimes one is the most oblivious to the most obvious. Added the third version lcp_v6. I could call using modules cheating, but a) it's core b) you have to admit that hippo's sort approach is sexy! `--- with le/ge abcdefghijklmnopqrstuvwxyz0.000100071512918021 abcdefghijklmnopqrstuvwxyz9.83499014282074e-05 0.786839962005615 --- with minstr/maxstr abcdefghijklmnopqrstuvwxyz0.000100071512918021 abcdefghijklmnopqrstuvwxyz9.83499014282074e-05 0.23346209526062` [download] Though your final comparison fails with some edge cases :-P Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply] [d/l]

update