GuiPerl has asked for the wisdom of the Perl Monks concerning the following question:

I am growing the following data structure from a flat file. The structure is as follows:
my @Divisions = qw(ABER BERF CECC DADD); my @rows; my %AG; my $Rec= {}; my %positions; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1},encoding = +> "utf-8"); #CSV file with comma delimited data open my $fh1, "<", "test.csv" or die "test.csv: $!"; while (my $row = $csv->getline ($fh1)) { # do something with @$row if ($row->[12]) { push @rows, $row; } else { push @rows, $row; } } close $fh1 or die "data.csv: $!"; foreach my $rec (@rows) { foreach my $dept (@Divisions) { if ($rec->[14] =~ /^$dept/ && $rec->[15] =~ /^A\W[1-5]|B\W +[1-2]/) { my $Rec = { SECTION=>$rec->[0], GRADE=>strip_hyphen($rec->[1]), POSITION=>$rec->[2], NAME=>invert_name($rec->[3]), AGE =>convert_date($rec->[4]), GENDER=>$rec->[5], }; push @{$AG{$rec->[10]}},$Rec; } } } foreach my $A (sort keys %AG) { foreach my $p (@{$AG{$A}}) { print $p->{'GRADE'}," ", $p->{'NAME'}," ",$p->{'POSITION'},$p- +>{'AGE'}," ",$p->{'GENDER'}, "\n"; } }
Output of sample data structure using Dump:
VAR1 = 'ABER - Advanced Technologies'; $VAR2 = [ { 'NAME' => 'J. Green', 'DATE_OF_BIRTH' => '8/18/1959', 'SECTION' => 'ABER', 'POSITION' => 'DIRECTOR', 'AGE' => 55, 'GRADE' => 'B2' } ]; $VAR3 = 'BERF - Satellite Research'; $VAR4 = [ { 'NAME' => 'P. Smith', 'DATE_OF_BIRTH' => '12/11/1957', 'SECTION' => 'BERF', 'POSITION' => 'CHIEF', 'AGE' => 56, 'GRADE' => 'B1' }, { 'NAME' => 'R. Forest', 'DATE_OF_BIRTH' => '1/18/1954', 'SECTION' => 'BERF', 'POSITION' => 'SENIOR OFFICER', 'AGE' => '60 GREEN', 'GRADE' => 'A5' }, { 'NAME' => 'R.Forest', 'DATE_OF_BIRTH' => '03/09/1964', 'SECTION' => 'BERF', 'POSITION' => 'SENIOR OFFICER', 'AGE' => 'Vacant', 'GRADE' => 'A5' }, { 'NAME' => 'K. King', 'DATE_OF_BIRTH' => '8/9/1960', 'SECTION' => 'BERF', 'POSITION' => 'SENIOR OFFICER', 'AGE' => 54, 'FEMALE' => '', 'GRADE' => 'A5' }, ];

What I need to do is to count the number of duplicate GRADE keys (i.e. B1, A5,A4,etc.) and then to delete the GRADE key value so that it does not appear in the output if there is more than 1 GRADE key value of the same type.

Expected Output:

B2, J. Green, DIRECTOR,55,M B1,P.Smith,CHIEF,54,M A5,R.Forest,SENIOR OFFICER,60,M K.King,SENIOR OFFICER,54, M (A5 is excluded because it appears more + than once) P.Turner,50, M (A5 is excluded because it appears more than once)

Any pointers would really be appreciated.

Replies are listed 'Best First'.
Re: Finding Duplicates and Deleting in a Complex Data Structure
by hdb (Monsignor) on Sep 05, 2014 at 13:21 UTC

    Instead of removing duplicate grades you should just suppress the printing of them. If you first sort your data by grades and then not print repeated grades, you should get what you want. It could look like this (based on simplified data):

    use strict; use warnings; my $data = [ { 'NAME' => 'J. Green', 'GRADE' => 'B2' }, { 'NAME' => 'P. Smith', 'GRADE' => 'B1' }, { 'NAME' => 'R. Forest', 'GRADE' => 'A5' }, { 'NAME' => 'R.Forest', 'GRADE' => 'A5' }, { 'NAME' => 'K. King', 'GRADE' => 'A5' }, ]; my $previous_grade = ''; for my $item ( sort { $a->{'GRADE'} cmp $b->{'GRADE'} } @$data ) { my( $grade, $name ) = ( $item->{'GRADE'}, $item->{'NAME'} ); print $grade eq $previous_grade ? ( ' ' x ( length( $grade )+1 ) ) + : "$grade,"; print "$name\n"; $previous_grade = $grade; }

    gives you

    A5,R. Forest R.Forest K. King B1,P. Smith B2,J. Green
      Thanks a million. By the way, how would I count the number of B2, A5s etc?

        You could use a hash and count within the loop:

        my $previous_grade = ''; my %grade_count; for my $item ( sort { $a->{'GRADE'} cmp $b->{'GRADE'} } @$data ) { my( $grade, $name ) = ( $item->{'GRADE'}, $item->{'NAME'} ); print $grade eq $previous_grade ? ( ' ' x ( length( $grade )+1 ) ) + : "$grade,"; print "$name\n"; $previous_grade = $grade; $grade_count{$grade}++; }