Re: Saving array duplicates, not using a hash?

If the input data is known to be ordered so that duplicates are always adjacent then the problem simplifies to:

use strict;
use warnings;

my $seen = '';
while(<DATA>) {
    my ($name, $phoneno, $address, $date, $salary) = split(/:/);
    next if $seen eq $name;
    $seen = $name;
    $salary += ($salary*10)/100;
    print "$name:$phoneno:$address:$date:$salary\n";
}
__DATA__
Barbara Kerz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/15/46:268
+500
Barbara Kerz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/15/46:268
+500
Barbara Kerz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/15/46:268
+500
Norma Corder:397-857-2735:74 Pine Street, Dearborn, MI 23874:3/28/45:2
+45700
Norma Corder:397-857-2735:74 Pine Street, Dearborn, MI 23874:3/28/45:2
+45700
Tommy Savage:408-724-0140:1222 Oxbow Court, Sunnyvale,CA 94087:5/19/66
+:34200
Tommy Savage:408-724-0140:1222 Oxbow Court, Sunnyvale,CA 94087:5/19/66
+:34200
Lesle Kerstin:408-456-1234:4 Harvard Square, Boston, MA 02133:4/22/62:
+52600
JonDeLoach:408-253-3122:123 Park St., San Jose, CA 94086:7/25/53:85100
[download]

When dealing with very large data sets if can make sense to use a highly optimised external sort tool such as GNU sort to put the data into an order that allows you to process it with O(1) memory usage. In this case that is a simple sort.

For smaller data sets stick with the usual hash approach. If you happen to know that the data will be sorted anyhow then you can use the hashless approach for smaller data but it is probably not worth it.

There is also the option of using Perl's sort but this is usually not a good option.

Comment on Re: Saving array duplicates, not using a hash? Download Code


The stupid question is the question not asked
	PerlMonks