printing the first column when both columns 2 and 3 are the same

novice2015 has asked for the wisdom of the Perl Monks concerning the following question:

I have a script to eliminate duplicates. here is the file that it reads from:

161248,/vol/filelist,CABINET
161200,/vol/filelist,INVENTORY
161400,/vol/filelist,INVENTORY
[download]

I'd like it to identify the duplicated line (/vol/filelist,INVENTORY) and print out this last line (which would correspond to the #161400 in the first column. This is what I have so far:

open my $FH2, '<', '/tmp/fileread' or die "unable to open file 'file' 
+for reading : $!";
open my $FH6, '>', '/tmp/tst.txt' or die "unable to open file 'file' f
+or reading : $!";

my %duplicates;
while (<$FH2>) {
    chomp;
    my ($column_1, $column_2, $column_3) = split /,/;
    print {$FH6} "$column_1\n" if defined $duplicates{$column_2} &&   
+ $duplicates{$column_3};
    $duplicates{$column_2}++;
}
close $FH6;
close $FH2;
open my $fh, '<', '/tmp/tst.txt' or die "unable to open file 'file' fo
+r reading : $!";
while (my $line = <$fh>) {
print $line;
}
close $fh;
[download]

This doesn't work. I can get it to work somewhat if I leave:

 my ($column_1, $column_2) = split /,/;
    print {$FH6} "$column_1\n" if defined duplicates{$column_2};
    $duplicates{$column_2}++;

But this only causes the last 2 lines of the file to be deleted, and I
+ want the last line deleted because on the last line both columns 2 a
+nd 3 are the same.

So what I am really looking for is to print the # in the first column 
+which corresponds to both column 2 and 3 being the same.

161248,/vol/filelist,CABINET
161200,/vol/filelist,INVENTORY
161400,/vol/filelist,INVENTORY

In the last line, column 2 (/vol/filelis) and column3 (INVENTORY) are 
+the same.

Does anyone have a suggestion?
[download]

Comment on printing the first column when both columns 2 and 3 are the same Select or Download Code

Replies are listed 'Best First'.
Re: printing the first column when both columns 2 and 3 are the same by BillKSmith (Monsignor) on Jun 27, 2016 at 16:17 UTC
Combine columns 2 and 3 to form the key of %duplicates. `use strict; use warnings; my %duplicates; while (<DATA>) { chomp; my( $tracking_number, $description ) = split /,/, $_, 2; print $tracking_number, "\n" if exists $duplicates{$description}; $duplicates{$description}++; } __DATA__ 161248,/vol/filelist,CABINET 161200,/vol/filelist,INVENTORY 161400,/vol/filelist,INVENTORY` [download] Bill	[reply] [d/l]
Re^2: printing the first column when both columns 2 and 3 are the same by novice2015 (Acolyte) on Jun 27, 2016 at 17:53 UTC
This worked great! Thanks Bill! I noticed the $_ and the 2 joined the columns as well.	[reply]
Re: printing the first column when both columns 2 and 3 are the same by talexb (Chancellor) on Jun 27, 2016 at 16:25 UTC
It sounds like what you want to do is store the first column's value based on the most recent combined value of the second and third column's values. After you've read all of the input file, you want to dump out the results you've collected. So, collect everything from the first file .. `my %data; open my $FH2, '<', '/tmp/fileread' or die "unable to open input file: $!"; while(<$FH2>) { my @cols = split(/,/); $data{join(':',@cols[1,2])} = $cols[0]; } close($FH2);` [download] And then dump out the stuff you've collected: `open my $FH6, '>', '/tmp/tst.txt' or die "unable to open output file: $!"; foreach my $key ( sort keys %data ) { my @keys = split(/:/,$key); print $FH6 join(',',$data{$key},@keys) . "\n"; } close ( $FH6 );` [download] Not tested .. that's just to give you an idea of how to solve it. Alex / talexb / Toronto Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.	[reply] [d/l] [select]
Re: printing the first column when both columns 2 and 3 are the same by kennethk (Abbot) on Jun 27, 2016 at 16:19 UTC
Examples are key, and a couple more would be helpful to make sure we really understand what you are trying to accomplish. See How do I post a question effectively? and I know what I mean. Why don't you?. If I understand what you are saying, you want to print when you get a match on both columns, but in your original code you are only keeping track of column 2. I would use a HASHES OF HASHES structure, with open my $FH2, '<', '/tmp/fileread' or die "unable to open file 'file' +for reading : $!"; open my $FH6, '>', '/tmp/tst.txt' or die "unable to open file 'file' f +or reading : $!"; my %duplicates; while (<$FH2>) { chomp; my ($column_1, $column_2, $column_3) = split /,/; print {$FH6} "$column_1\n" if $duplicates{$column_2}{$column_3}++; } close $FH6; close $FH2; open my $fh, '<', '/tmp/tst.txt' or die "unable to open file 'file' fo +r reading : $!"; while (my $line = <$fh>) { print $line; } close $fh; [download] where I have modified the classic code to remove duplicates from How can I remove duplicate elements from a list or array? in perlfaq4. #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply] [d/l]