harry34 has asked for the wisdom of the Perl Monks concerning the following question:

If I have the following output how would I code to output a list of duplicate data only (e.g A(01)in this case).
Needs to be coded in a general fashion as numbers and letters could change.

@array =

A(01)
B(02)
C(03)
A(01)
D(04)
E(05)

Thanks for your help Harry

Replies are listed 'Best First'.
Re: finding duplicate data
by gjb (Vicar) on Jan 21, 2004 at 10:49 UTC

    Sounds very much like homework, so I'll just give you a tip: a hash (e.g. %data) would come in handy. You can use the data (A(01), B(02), etc) as keys and the number of times you encounter the data as values.

    $data{$line}++;
    would do the trick.

    As a final step, you iterate over the keys (the function keys is useful here) in the hash and print those keys that have values larger than 1.

    Hope this helps, -gjb-

Re: finding duplicate data
by borisz (Canon) on Jan 21, 2004 at 10:54 UTC
    use a hash to count the number of each string occurance.
    #!/usr/bin/perl while (<DATA>){ chomp; $h{$_}++; } for ( sort grep { $h{$_} != 1 } keys %h ) { print "$_\n"; } __DATA__ A(01) B(02) C(03) A(01) D(04) E(05)
    Boris
      That works great !.
      What is the second part of the code doing ?
      i.e. how is it working ?
        Which part has you puzzled? grep? for () implicitly setting $_? (If the former, you should be able to run "perldoc -f grep" to get a description of what grep does. If for some reason you have a broken perl that doesn't include perldoc, try here.)
Re: finding duplicate data
by l3nz (Friar) on Jan 21, 2004 at 12:39 UTC
    This is an one-liner. As you can see, you can check the threshold value for items to show up by tuning the constant. Hope this helps.

    map { print $_ if ( $h{$_}++ == 1) } <DATA>; __DATA__ A(01) A(01) A(01) B(02) C(03) A(01) D(04) E(05) ...
    One-liners are definitely funny.
Re: finding duplicate data
by chimni (Pilgrim) on Jan 21, 2004 at 11:50 UTC

    You could also do it in a compact command line manner
    To find only the duplicate enteries
    perl -ne 'print if $h{$_}++' filename
    To find unique data
    perl -ne 'print unless $h{$_}++' filename
    Of course , you could simply do cat filename | uniq at the shell prompt for the second case.
    HTH,
    chimni
      Useless use of cat, and a misunderstanding of uniq (it only looks for duplicate adjacent lines). Instead, use
      sort -u filename
      to find unique data (though the perl solution would be doing less work, and would not scramble the line order).

      To list the duplicate entries only once,

      perl -ne 'print if $h{$_}++ == 1' filename

      The PerlMonk tr/// Advocate