in reply to Re^2: removing duplicate lines
in thread removing duplicate lines
use strict; use warnings; use Benchmark qw(cmpthese); our @lines = <DATA>; chomp @lines; our $rsHashSlice = sub { my %uniques; @uniques{@lines} = (); my @sorted; push @sorted, $_ for sort keys %uniques; return @sorted; }; our $rsSeen = sub { my %seen; my @sorted; foreach (@lines) { push @sorted, $_ unless $seen{$_}++; } return @sorted; }; cmpthese(100000, { HashSlice => $rsHashSlice, Seen => $rsSeen }); __END__ black black black black black black black black black black black black black black black black blue blue blue blue blue blue blue blue blue green green green green green green green green green green grey grey grey grey iolet mauve mauve mauve mauve mauve mauve mauve mauve pink pink pink pink pink purple purple purple red red red red red red red red violet violet violet violet violet violet violet violet violet white white white white white white white yellow yellow yellow yellow yellow yellow yellow
produces
Rate Seen HashSlice Seen 18939/s -- -34% HashSlice 28571/s 51% --
I have returned a list in each case as that seems to be closer in essence to the print in the OP than the list reference I would normally use.
Cheers,
JohnGG
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: removing duplicate lines
by revdiablo (Prior) on Apr 10, 2006 at 23:19 UTC | |
by johngg (Canon) on Apr 11, 2006 at 08:55 UTC | |
by revdiablo (Prior) on Apr 11, 2006 at 16:24 UTC | |
by johngg (Canon) on Apr 12, 2006 at 09:20 UTC |