hmm ... actually I didn't want to invest time coding for this xy-nonsense.
Sorry, but no one forced you to. You asked questions, I did my best to answer them. That''s all.
FWIW: Converted to a form that allows it to be used in a realistic and repeatable scenario:
#! perl -slw use strict; use Time::HiRes qw[ time ]; $|++; sub uniq{ my %x; @x{@_} = (); keys %x } my $start = time; my @uniq = uniq <>; chomp @uniq; @uniq = sort{ length $b <=> length $a } @uniq; my $longest = shift @uniq; for my $x ( @uniq ) { next if 1+ index $longest, $x; print $x; $longest .= "\n" . $x; } printf STDERR "Took %.3f\n", time() - $start;
Whilst you're ~10% quicker on low numbers of strings:
c:\test>906020 906020.10e3 > 906020.filtered Took 48.854 c:\test>wc -l 906020.filtered 5000 906020.filtered c:\test>906020-lanx 906020.10e3 > lanx.filtered Took 43.122 c:\test>wc -l lanx.filtered 4999 lanx.filtered c:\test>906020 906020.10e3 > 906020.filtered (inline version) Took 21.744 c:\test>wc -l 906020.filtered 5000 906020.filtered
As your own timings show, as the numbers of strings increase, the cost of constantly reallocating your accumulator string in order to append the new one starts to dominate. I suspect that by the time you get to the OPs 200,000 strings you going to be considerably slower. (You also have an out-by-one error somewhere, but that is probably easily fixed.)
In reply to Re^11: list of unique strings, also eliminating matching substrings
by BrowserUk
in thread list of unique strings, also eliminating matching substrings
by lindsay_grey
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |