Chaps,

I had a go benchmarking various ways of pulling unique values out of a list which I had seen in various text books. It looks like a hash slice is the quickest on my ancient hardware. Your mileage may vary.

#!/usr/local/bin/perl # use warnings; use strict; use Benchmark; our @data = <DATA>; chomp @data; our $rcHash = sub { my %seen = (); $seen{$_} ++ for @data; return keys %seen; }; our $rcHashGrep = sub { my %seen = (); return grep {! $seen{$_} ++} @data; }; our $rcHashSlice = sub { my %uniq; @uniq{@data} = (); return keys %uniq; }; our $rcListHash = sub { my %seen = (); my @uniq = (); foreach my $item (@data) { push @uniq, $item unless $seen{$item} ++; } return @uniq; }; our $rcMapHash = sub { return keys %{{map {$_ => 1} @data}}; }; timethese(5000, { Hash => $rcHash, HashGrep => $rcHashGrep, HashSlice => $rcHashSlice, ListHash => $rcListHash, MapHash => $rcMapHash}); __END__ red blue yellow green black white purple mauve pink grey violet black white blue green red mauve violet black red blue yellow green black white purple mauve pink grey violet black white blue green red mauve violet black mauve violet black red blue yellow green black violet black red blue yellow mauve pink grey violet black white blue green yellow green black iolet black red green black white purple mauve pink yellow green black violet black red blue yellow mauve pink grey violet black white blue green

Produces the following metrics.

Benchmark: timing 5000 iterations of Hash, HashGrep, HashSlice, ListHash, MapHash...
Hash: 4 wallclock secs ( 3.48 usr + 0.00 sys = 3.48 CPU) @ 1436.78/s (n=5000)
HashGrep: 4 wallclock secs ( 3.29 usr + 0.00 sys = 3.29 CPU) @ 1519.76/s (n=5000)
HashSlice: 1 wallclock secs ( 1.16 usr + 0.00 sys = 1.16 CPU) @ 4310.34/s (n=5000)
ListHash: 5 wallclock secs ( 5.03 usr + 0.00 sys = 5.03 CPU) @ 994.04/s (n=5000)
MapHash: 6 wallclock secs ( 5.89 usr + 0.00 sys = 5.89 CPU) @ 848.90/s (n=5000)

Cheers,

JohnGG


In reply to Re^3: What does 'next if $hash{$elem}++;' mean? by johngg
in thread What does 'next if $hash{$elem}++;' mean? by Win

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.