in reply to Re: Selective printing of the Duplicates
in thread Selective printing of the Duplicates

Thank you!! If you notice, my data file is a space delimited file and the records are not exactly duplicates..
30380868 N Sep 29 356200 AGEC682569 ATI + S 30380868 ** N Sep 29 356200 AGEC682569 ATI + S
Im going to split the record and then, I will look for column1 which is here as -> (30380868). Then if you notice, I have 2nd column as ** and then empty.. All the other columns remain the same.. I want to print the record which has ** in the 2nd column.. Above code you provided only gives me 1st duplicate value which is
30380868 N Sep 29 356200 AGEC682569 ATI + S

Replies are listed 'Best First'.
Re^3: Selective printing of the Duplicates
by Not_a_Number (Prior) on Jan 30, 2013 at 18:57 UTC
    my %seen; for ( reverse <DATA> ) { ( my $unstarred = $_ ) =~ s/\*\*/ /; print unless $seen{ $unstarred }++; } __DATA__ 30380868 N Sep 29 356200 AGEC682569 ATI + S 30380868 ** N Sep 29 356200 AGEC682569 ATI + S 71130740 N Sep 7 SM9481 AGEC683966 ATI + S 71130740 ** N Sep 7 SM9481 AGEC683966 ATI + S 32450045 N Jul 14 SN9672 AGEC685203 ATI + S 32450045 ** N Jul 14 SN9672 AGEC685203 ATI + S 36450223 N Aug 30 SU8329 AGEC685348 ATI + S 34680135 N Sep 30 349450 AGEC685442 ATI + S

    The only difference from your 'desired output' is that this code prints it out in reverse order. But I don't see this as a problem, as there seems to be no inherent order in the input/output anyway...

Re^3: Selective printing of the Duplicates
by Kenosis (Priest) on Jan 30, 2013 at 19:18 UTC

    My apologies, as I misunderstood. Try the following:

    use strict; use warnings; my %seen; while (<DATA>) { chomp; my ($col1) = /(\d+)/; $seen{$col1} = $_ if /\*\*/; $seen{$col1} = $_ if !exists $seen{$col1} or $seen{$col1} !~ /\*\*/; } print "$seen{$_}\n" for keys %seen; __DATA__ 11111111 ** N Sep 29 356200 AGEC682569 ATI + S 11111111 N Sep 29 356200 AGEC682569 ATI + S 30380868 N Sep 29 356200 AGEC682569 ATI + S 30380868 ** N Sep 29 356200 AGEC682569 ATI + S 71130740 N Sep 7 SM9481 AGEC683966 ATI + S 71130740 ** N Sep 7 SM9481 AGEC683966 ATI + S 32450045 N Jul 14 SN9672 AGEC685203 ATI + S 32450045 ** N Jul 14 SN9672 AGEC685203 ATI + S 36450223 N Aug 30 SU8329 AGEC685348 ATI + S 34680135 N Sep 30 349450 AGEC685442 ATI + S

    Output:

    71130740 ** N Sep 7 SM9481 AGEC683966 ATI + S 34680135 N Sep 30 349450 AGEC685442 ATI + S 32450045 ** N Jul 14 SN9672 AGEC685203 ATI + S 11111111 ** N Sep 29 356200 AGEC682569 ATI + S 36450223 N Aug 30 SU8329 AGEC685348 ATI + S 30380868 ** N Sep 29 356200 AGEC682569 ATI + S

    The above will preferentially keep starred records, regardless of the order records are processed.

      Thanks a Ton!!!!! This was something I was looking for ..

        You're most welcome!