Chaps,
I had a go benchmarking various ways of pulling unique values out of a list which I had seen in various text books. It looks like a hash slice is the quickest on my ancient hardware. Your mileage may vary.
#!/usr/local/bin/perl
#
use warnings;
use strict;
use Benchmark;
our @data = <DATA>;
chomp @data;
our $rcHash = sub
{
my %seen = ();
$seen{$_} ++ for @data;
return keys %seen;
};
our $rcHashGrep = sub
{
my %seen = ();
return grep {! $seen{$_} ++} @data;
};
our $rcHashSlice = sub
{
my %uniq;
@uniq{@data} = ();
return keys %uniq;
};
our $rcListHash = sub
{
my %seen = ();
my @uniq = ();
foreach my $item (@data)
{
push @uniq, $item unless $seen{$item} ++;
}
return @uniq;
};
our $rcMapHash = sub
{
return keys %{{map {$_ => 1} @data}};
};
timethese(5000, {
Hash => $rcHash,
HashGrep => $rcHashGrep,
HashSlice => $rcHashSlice,
ListHash => $rcListHash,
MapHash => $rcMapHash});
__END__
red
blue
yellow
green
black
white
purple
mauve
pink
grey
violet
black
white
blue
green
red
mauve
violet
black
red
blue
yellow
green
black
white
purple
mauve
pink
grey
violet
black
white
blue
green
red
mauve
violet
black
mauve
violet
black
red
blue
yellow
green
black
violet
black
red
blue
yellow
mauve
pink
grey
violet
black
white
blue
green
yellow
green
black
iolet
black
red
green
black
white
purple
mauve
pink
yellow
green
black
violet
black
red
blue
yellow
mauve
pink
grey
violet
black
white
blue
green
Produces the following metrics.
Benchmark: timing 5000 iterations of Hash, HashGrep, HashSlice, ListHash, MapHash...
Hash: 4 wallclock secs ( 3.48 usr + 0.00 sys = 3.48 CPU) @ 1436.78/s (n=5000)
HashGrep: 4 wallclock secs ( 3.29 usr + 0.00 sys = 3.29 CPU) @ 1519.76/s (n=5000)
HashSlice: 1 wallclock secs ( 1.16 usr + 0.00 sys = 1.16 CPU) @ 4310.34/s (n=5000)
ListHash: 5 wallclock secs ( 5.03 usr + 0.00 sys = 5.03 CPU) @ 994.04/s (n=5000)
MapHash: 6 wallclock secs ( 5.89 usr + 0.00 sys = 5.89 CPU) @ 848.90/s (n=5000)
Cheers, JohnGG |