novice2015 has asked for the wisdom of the Perl Monks concerning the following question:

I have a script to eliminate duplicates. here is the file that it reads from:
161248,/vol/filelist,CABINET 161200,/vol/filelist,INVENTORY 161400,/vol/filelist,INVENTORY
I'd like it to identify the duplicated line (/vol/filelist,INVENTORY) and print out this last line (which would correspond to the #161400 in the first column. This is what I have so far:
open my $FH2, '<', '/tmp/fileread' or die "unable to open file 'file' +for reading : $!"; open my $FH6, '>', '/tmp/tst.txt' or die "unable to open file 'file' f +or reading : $!"; my %duplicates; while (<$FH2>) { chomp; my ($column_1, $column_2, $column_3) = split /,/; print {$FH6} "$column_1\n" if defined $duplicates{$column_2} && + $duplicates{$column_3}; $duplicates{$column_2}++; } close $FH6; close $FH2; open my $fh, '<', '/tmp/tst.txt' or die "unable to open file 'file' fo +r reading : $!"; while (my $line = <$fh>) { print $line; } close $fh;
This doesn't work. I can get it to work somewhat if I leave:
my ($column_1, $column_2) = split /,/; print {$FH6} "$column_1\n" if defined duplicates{$column_2}; $duplicates{$column_2}++; But this only causes the last 2 lines of the file to be deleted, and I + want the last line deleted because on the last line both columns 2 a +nd 3 are the same. So what I am really looking for is to print the # in the first column +which corresponds to both column 2 and 3 being the same. 161248,/vol/filelist,CABINET 161200,/vol/filelist,INVENTORY 161400,/vol/filelist,INVENTORY In the last line, column 2 (/vol/filelis) and column3 (INVENTORY) are +the same. Does anyone have a suggestion?

Replies are listed 'Best First'.
Re: printing the first column when both columns 2 and 3 are the same
by BillKSmith (Monsignor) on Jun 27, 2016 at 16:17 UTC

    Combine columns 2 and 3 to form the key of %duplicates.

    use strict; use warnings; my %duplicates; while (<DATA>) { chomp; my( $tracking_number, $description ) = split /,/, $_, 2; print $tracking_number, "\n" if exists $duplicates{$description}; $duplicates{$description}++; } __DATA__ 161248,/vol/filelist,CABINET 161200,/vol/filelist,INVENTORY 161400,/vol/filelist,INVENTORY
    Bill
      This worked great! Thanks Bill! I noticed the $_ and the 2 joined the columns as well.
Re: printing the first column when both columns 2 and 3 are the same
by talexb (Chancellor) on Jun 27, 2016 at 16:25 UTC

    It sounds like what you want to do is store the first column's value based on the most recent combined value of the second and third column's values. After you've read all of the input file, you want to dump out the results you've collected.

    So, collect everything from the first file ..

    my %data; open my $FH2, '<', '/tmp/fileread' or die "unable to open input file: $!"; while(<$FH2>) { my @cols = split(/,/); $data{join(':',@cols[1,2])} = $cols[0]; } close($FH2);
    And then dump out the stuff you've collected:
    open my $FH6, '>', '/tmp/tst.txt' or die "unable to open output file: $!"; foreach my $key ( sort keys %data ) { my @keys = split(/:/,$key); print $FH6 join(',',$data{$key},@keys) . "\n"; } close ( $FH6 );
    Not tested .. that's just to give you an idea of how to solve it.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: printing the first column when both columns 2 and 3 are the same
by kennethk (Abbot) on Jun 27, 2016 at 16:19 UTC
    Examples are key, and a couple more would be helpful to make sure we really understand what you are trying to accomplish. See How do I post a question effectively? and I know what I mean. Why don't you?.

    If I understand what you are saying, you want to print when you get a match on both columns, but in your original code you are only keeping track of column 2. I would use a HASHES OF HASHES structure, with

    open my $FH2, '<', '/tmp/fileread' or die "unable to open file 'file' +for reading : $!"; open my $FH6, '>', '/tmp/tst.txt' or die "unable to open file 'file' f +or reading : $!"; my %duplicates; while (<$FH2>) { chomp; my ($column_1, $column_2, $column_3) = split /,/; print {$FH6} "$column_1\n" if $duplicates{$column_2}{$column_3}++; } close $FH6; close $FH2; open my $fh, '<', '/tmp/tst.txt' or die "unable to open file 'file' fo +r reading : $!"; while (my $line = <$fh>) { print $line; } close $fh;
    where I have modified the classic code to remove duplicates from How can I remove duplicate elements from a list or array? in perlfaq4.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.