Saving array duplicates, not using a hash?

gctaylor1 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I'm using a self study book to learn programming and an end-of-chapter exercise has me stumped. Well, actually I've come up with a way that solves the problem, but not in a way that I think the author intended.

It's a multi-part problem but here's the part in question.
1. The program will read each line of the datebook file giving each person a 10% raise in salary.
2. If, however, the person appears more than once in the file (assume having the same first and last name means it is a duplicate), he will be given a raise the first time, but if he appears again, he will be skipped over.
3. Send each line of output to a file called raise.
4. The raise file should not contain any person's name more than once.
5. It will also reflect the 10% raise in pay.
6. Display on the screen the average salary for all the people in the datebook file.
7. For duplicate entries, print the names of those who appears in the file more than once and how many times each appears.

After trial and error for a couple of days I decided to use the examples I found in the Perl FAQ for dealing with duplicate entries in arrays. I'd been resisting that approach because the FAQ examples use concepts I'm unfamiliar with and feel that I've missed the concept the author was trying to convey.

I was having difficulty finding the duplicate entries and then knowing what to do with them while still staying close to the author's guidelines. Most of the suggestions on the www point to using a hash when dealing with duplicate array elements. My book hasn't covered hashes -too- much and in the past the exercises have been solvable by using something that's already been explicitly shown in an example. So I was trying to solve this using arrays, control structures, and regexps. Is that even possible?

I tried to use a foreach loop and take the first element and compare it against the rest of the elements. That kind of worked. I know I'll get at least one match for each element as it matches itself and I can account for that by simply subtracting one if there are more matches. But I couldn't figure out what to do if the element is present n times. I tried nested for loops too and that kind of works but still doesn't work when you get n+1 elements.

#!/usr/bin/perl 

use strict;
use warnings;

my @unique = ();
my %seen   = ();

foreach my $elem (@originalarray) {
    next if $seen{$elem}++;
    push @unique, $elem;
}

my ( @union, @intersection, @difference );
my %count;
my $element;

@union = @intersection = @difference = ();
%count = ();
foreach $element ( @originalarray, @unique ) { $count{$element}++ }
foreach $element ( keys %count ) {
    push @union, $element;
    push @{ $count{$element} > 1 ? \@intersection : \@difference }, $e
+lement;
}
while ( ( my $key, my $value ) = each(%count) ) {
    print "", ( $value - 2 ) . " " . $key if ( $value >= 3 );
}
[download]

Here's a sample of the data:

Barbara Kerz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/15/46:268
+500
Barbara Kerz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/15/46:268
+500
Barbara Kerz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/15/46:268
+500
Norma Corder:397-857-2735:74 Pine Street, Dearborn, MI 23874:3/28/45:2
+45700
Norma Corder:397-857-2735:74 Pine Street, Dearborn, MI 23874:3/28/45:2
+45700
Tommy Savage:408-724-0140:1222 Oxbow Court, Sunnyvale,CA 94087:5/19/66
+:34200
Tommy Savage:408-724-0140:1222 Oxbow Court, Sunnyvale,CA 94087:5/19/66
+:34200
Lesle Kerstin:408-456-1234:4 Harvard Square, Boston, MA 02133:4/22/62:
+52600
JonDeLoach:408-253-3122:123 Park St., San Jose, CA 94086:7/25/53:85100
[download]

With these results, which I'm happy with.

1 Norma Corder:397-857-2735:74 Pine Street, Dearborn, MI 23874:3/28/45
+:245700
2 Barbara Kerz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/15/46:2
+68500
1 Tommy Savage:408-724-0140:1222 Oxbow Court, Sunnyvale,CA 94087:5/19/
+66:34200
[download]

If you made it this far, thank-you, and I'd appreciate any advice you can give.

Comment on Saving array duplicates, not using a hash? Select or Download Code

Replies are listed 'Best First'.
Re: Saving array duplicates, not using a hash? by lamp (Chaplain) on Sep 28, 2008 at 06:21 UTC
Here is what i would do 1) open the datebook input file. 2) open 'raise' file for writing in append mode 3) Iterate through the input file. 4) split each line based on ':'. As i understood from the above example is that name, phone number, address, date and salary is seperated by ':'. 5) build a hash making name as key. 6) calculate 10% increment and add it to the salary. 7) print the values to the 'raise' file. 8) close both the file handles. 9) iterate through the hash for printing the names appeared more than once in input file. You can refer the following untested code for more information. #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %seen = (); open(FH,">>raise.txt"); while(<DATA>) { my ($name, $phoneno, $address, $date, $salary) = split(/:/); next if $seen{$name}++; #$seen{$name}++; $salary += ($salary10)/100; print FH "$name:$phoneno:$address:$date:$salary\n"; } close(FH); map { print "$_ appread $seen{$_} times in the file\n" if($seen{$_} > 1) +; }keys(%seen); __END__ Barbara Kerz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/15/46:268 +500 Barbara Kerz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/15/46:268 +500 Barbara Kerz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/15/46:268 +500 Norma Corder:397-857-2735:74 Pine Street, Dearborn, MI 23874:3/28/45:2 +45700 Norma Corder:397-857-2735:74 Pine Street, Dearborn, MI 23874:3/28/45:2 +45700 Tommy Savage:408-724-0140:1222 Oxbow Court, Sunnyvale,CA 94087:5/19/66 +:34200 Tommy Savage:408-724-0140:1222 Oxbow Court, Sunnyvale,CA 94087:5/19/66 +:34200 Lesle Kerstin:408-456-1234:4 Harvard Square, Boston, MA 02133:4/22/62: +52600 JonDeLoach:408-253-3122:123 Park St., San Jose, CA 94086:7/25/53:85100 [download] console output* `Barbara Kerz appread 4 times in input file JonDeLoach appread 2 times in input file Norma Corder appread 3 times in input file Lesle Kerstin appread 2 times in input file Tommy Savage appread 3 times in input file` [download] raise file output `Barbara Kerz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/15/46:295 +350 Norma Corder:397-857-2735:74 Pine Street, Dearborn, MI 23874:3/28/45:2 +70270 Tommy Savage:408-724-0140:1222 Oxbow Court, Sunnyvale,CA 94087:5/19/66 +:37620 Lesle Kerstin:408-456-1234:4 Harvard Square, Boston, MA 02133:4/22/62: +57860 JonDeLoach:408-253-3122:123 Park St., San Jose, CA 94086:7/25/53:93610` [download] Update: commented out one of the hash increments as mentioned by GrandFather	[reply] [d/l] [select]
Re^2: Saving array duplicates, not using a hash? by GrandFather (Saint) on Sep 28, 2008 at 06:37 UTC
That increments $seen{$name} twice the first time $name is seen - once in the test and again in the controlled block. You don't need the increment in the controlled block. The count shown in the console output should have alerted you to the problem. Perl reduces RSI - it saves typing	[reply]
Re: Saving array duplicates, not using a hash? by graff (Chancellor) on Sep 28, 2008 at 14:09 UTC
It seems like you might be making your learning process more difficult or time consuming than it needs to be. Don't worry about guessing what the author's intent may have been regarding the particular code that should solve a given exercise. The author knows there are many ways to solve each exercise with Perl, and your real task here is simply to figure out at least one way to do it and move on to the next exercise or the next chapter. If hashes have been mentioned at all prior to this particular exercise, then use a hash. Even if the book hasn't mentioned hashes yet, but you know about them from studying other sources, use a hash and move on. You should not assume that the author wants you to come up with some other sub-optimal solution that will be of no use to you in the future. As you get further along in that book, if you find this sort of exercise being repeated with a focus on using hashes, you'll just get through that part more quickly because you've already learned it. I know of a teaching method that works like this: (1) introduce some basic tools; (2) build familiarity and facility with these using simple exercises; (3) introduce more complex exercises, forcing students to do lots of tedious, mind-numbing, time-consuming work; (4) introduce some advanced tools and show how the more complex problems can be solved quickly and easily. This was the method used when I was first introduced to calculus in high school, and it was pretty effective, but it takes a lot longer than just saying "when you have this sort of problem, here is the quickest, easiest way to solve it." These days (nearly 40 years later), I tend to favor the latter approach -- life is short, and you should take the long way around only if you really enjoy doing that and it holds some greater value for you personally (and you actually have the extra time to spare).	[reply]
Re: Saving array duplicates, not using a hash? by nobull (Friar) on Sep 28, 2008 at 07:20 UTC
If the input data is known to be ordered so that duplicates are always adjacent then the problem simplifies to: use strict; use warnings; my $seen = ''; while(<DATA>) { my ($name, $phoneno, $address, $date, $salary) = split(/:/); next if $seen eq $name; $seen = $name; $salary += ($salary10)/100; print "$name:$phoneno:$address:$date:$salary\n"; } __DATA__ Barbara Kerz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/15/46:268 +500 Barbara Kerz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/15/46:268 +500 Barbara Kerz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/15/46:268 +500 Norma Corder:397-857-2735:74 Pine Street, Dearborn, MI 23874:3/28/45:2 +45700 Norma Corder:397-857-2735:74 Pine Street, Dearborn, MI 23874:3/28/45:2 +45700 Tommy Savage:408-724-0140:1222 Oxbow Court, Sunnyvale,CA 94087:5/19/66 +:34200 Tommy Savage:408-724-0140:1222 Oxbow Court, Sunnyvale,CA 94087:5/19/66 +:34200 Lesle Kerstin:408-456-1234:4 Harvard Square, Boston, MA 02133:4/22/62: +52600 JonDeLoach:408-253-3122:123 Park St., San Jose, CA 94086:7/25/53:85100 [download] When dealing with very large data sets if can make sense to use a highly optimised external sort tool such as GNU sort to put the data into an order that allows you to process it with O(1) memory usage. In this case that is a simple sort. For smaller data sets stick with the usual hash approach. If you happen to know* that the data will be sorted anyhow then you can use the hashless approach for smaller data but it is probably not worth it. There is also the option of using Perl's sort but this is usually not a good option.	[reply] [d/l]
Re: Saving array duplicates, not using a hash? by Anonymous Monk on Sep 28, 2008 at 06:20 UTC
Is that even possible? Absolutely, that chapter should have taught you how. Here's some pseudo code `for each ELEMENT of ARRAY if ELEMENT found in OTHERARRAY then print ELEMENT else add ELEMENT to OTHERARRAY calculate raise ... endif endfor ...` [download] guess what, the part that checks for duplicates without hashes looks exactly like this first :)	[reply] [d/l]
Re^2: Saving array duplicates, not using a hash? by gctaylor1 (Hermit) on Sep 28, 2008 at 07:38 UTC
1. I don't understand what I would use for OTHERARRAY? 2. The if compares to a single element in OTHERARRAY right? How would I get at the single element unless I use another for loop to cycle through it?	[reply]
Re^3: Saving array duplicates, not using a hash? by Anonymous Monk on Sep 28, 2008 at 08:11 UTC
1. I don't understand what I would use for OTHERARRAY? Something like @OTHERARRAY or @SOMETHING, @POOPS, @MCGEE, @FOO, @BAR, @BAZ, @MOO, @SHOO, @GCTAYLOR1, @ELEMENTSSEENALREADY... :) understand? ARRAY is one name, OTHERARRAY is another. 2. The if compares to a single element in OTHERARRAY right? How would I get at the single element unless I use another for loop to cycle through it? Almost, it compares to every single element in OTHERARRAY, and you use another loop for that. Loops are the basis for computers (everything is a loop, from RAM to CPU). Its one of those fundamental you must master, so stick to it and it will stick to you :) foreach String comparison After you've got that loop version working, next step is to do it without an explicit loop. You'll be using index and join :)	[reply]
Re: Saving array duplicates, not using a hash? by Anonymous Monk on Sep 28, 2008 at 06:09 UTC
What book?	[reply]
Re^2: Saving array duplicates, not using a hash? by gctaylor1 (Hermit) on Sep 28, 2008 at 07:08 UTC
Perl By Example 4th Edition - Ellie Quigley	[reply]