in reply to promoting array to a hash

If all you are doing is printing a list of unique words from stdin, why not save a lot of wasted code and do:
print "$_\n" for sort <> =~ /\b(\S+)\b(?!.*\b\1\b)/g
That is, use a negative lookahead to check the word doesn't appear again. Saves you joining, splitting, grepping, and mapping :). I have not benchmarked it, though.

Replies are listed 'Best First'.
Re^2: promoting array to a hash
by sleepingsquirrel (Chaplain) on Jun 14, 2004 at 17:10 UTC
    Benchmarking is worthwhile in this instance. The regex backtracking turns an N*log(n) problem (assuming the sort dominates) into an N^2 problem. Here's the result of applying the two algorithms to the Net-Howto (which is 100 times smaller than the data set I initially used).
    greg@spark:~/test$ cat sleepingsquirrel #!/usr/bin/perl print "$_\n" for sort keys %{{map {$_,()} grep /^[a-z]+$/, (split /\s/ +, join(" ",<>))}}; greg@spark:~/test$ time sleepingsquirrel Net-HOWTO >words.txt real 0m0.178s user 0m0.158s sys 0m0.016s greg@spark:~/test$ cat jasper #!/usr/bin/perl $/=undef; print "$_\n" for sort <> =~ /\b([a-z]+)\b(?!.*\b\1\b)/sg greg@spark:~/test$ time jasper Net-HOWTO >words2.txt real 1m8.477s user 1m8.471s sys 0m0.003s
    ...only about 350x slower. YMMV