johnirl has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks
I am attempting to do remove duplicated lines from a comma delimited file. To the human eye the files appear the same but for some reason the uniq command doesn't see them as duplicates. I was hoping to be able to use perl to remove them.

Below is the file followed by the file after being run through od -ta

B1 name;B1 Bmk;B1 host;B1 arch;B1 mem;B1 dvr;B1 arw; SomeName;Blondie;server15;unix;1;1;32 SomeName;Blondie;server15;unix;2;2;43 SomeName;Blondie;server15;unix;3;3;54 SomeName;Blondie;server15;unix;4;4;65: B1 name;B1 Bmk;B1 host;B1 arch;B1 mem;B1 dvr;B1 arw; SomeName;Blondie;server12;unix;1;5;76 SomeName;Blondie;server12;unix;2;6;87 SomeName;Blondie;server12;unix;3;7;89 SomeName;Blondie;server12;unix;4;8;09;

0000000 B 1 sp n a m e ; B 1 sp B m k ; +B 0000020 1 sp h o s t ; B 1 sp a r c h ; +B 0000040 1 sp m e m ; B 1 sp d v r ; B 1 s +p 0000060 a r w ; nl S o m e N a m e ; B +l 0000100 o n d i e ; s e r v e r 1 5 ; +u 0000120 n i x ; 1 ; 1 ; 3 2 nl S o m e +N 0000140 a m e ; B l o n d i e ; s e r +v 0000160 e r 1 5 ; u n i x ; 2 ; 2 ; 4 +3 0000200 nl S o m e N a m e ; B l o n d +i 0000220 e ; s e r v e r 1 5 ; u n i x +; 0000240 3 ; 3 ; 5 4 nl S o m e N a m e +; 0000260 B l o n d i e ; s e r v e r 1 +5 0000300 ; u n i x ; 4 ; 4 ; 6 5 : nl B +1 0000320 sp n a m e ; B 1 sp B m k ; B 1 s +p 0000340 h o s t ; B 1 sp a r c h ; B 1 s +p 0000360 m e m ; B 1 sp d v r ; B 1 sp a +r 0000400 w ; nl S o m e N a m e ; B l o +n 0000420 d i e ; s e r v e r 1 2 ; u n +i 0000440 x ; 1 ; 5 ; 7 6 nl S o m e N a +m 0000460 e ; B l o n d i e ; s e r v e +r 0000500 1 2 ; u n i x ; 2 ; 6 ; 8 7 nl +S 0000520 o m e N a m e ; B l o n d i e +; 0000540 s e r v e r 1 2 ; u n i x ; 3 +; 0000560 7 ; 8 9 nl S o m e N a m e ; B +l 0000600 o n d i e ; s e r v e r 1 2 ; +u 0000620 n i x ; 4 ; 8 ; 0 9 ; nl 0000634

Replies are listed 'Best First'.
Re: Removing Duplicate Lines from a File
by DamnDirtyApe (Curate) on Jul 17, 2002 at 15:31 UTC

    This should only print each line in my_file.txt once:

    perl -ne 'print unless $n{$_}++' my_file.txt

    _______________
    D a m n D i r t y A p e
    Home Node | Email
      But how do I embed that into a perl program?
        If you run DamnDirtyApe's excellent one-liner through Deparse:
        perl -MO=Deparse -ne 'print unless $n{$_}++' my_file.txt
        you will see how:
        LINE: while (defined($_ = <ARGV>)) { print $_ unless $n{$_}++; }
        You can remove the LINE: label and you should add use strict.

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        B--B--B--B--B--B--B--B--
        H---H---H---H---H---H---
        (the triplet paradiddle with high-hat)
        
        my %uniq; $uniq{$_}++ while <DATA>; print for keys %uniq; __DATA__ B1 name;B1 Bmk;B1 host;B1 arch;B1 mem;B1 dvr;B1 arw; SomeName;Blondie;server15;unix;1;1;32 SomeName;Blondie;server15;unix;2;2;43 SomeName;Blondie;server15;unix;3;3;54 SomeName;Blondie;server15;unix;4;4;65: B1 name;B1 Bmk;B1 host;B1 arch;B1 mem;B1 dvr;B1 arw; SomeName;Blondie;server12;unix;1;5;76 SomeName;Blondie;server12;unix;2;6;87 SomeName;Blondie;server12;unix;3;7;89 SomeName;Blondie;server12;unix;4;8;09;

        HTH

        _________
        broquaint

Re: Removing Duplicate Lines from a File
by amphiplex (Monk) on Jul 17, 2002 at 15:44 UTC
    Hi !

    The uniq command expects sorted input, so you should do sort file | uniq, or, better: sort -u file

    as to how to use DamnDirtyApes answer in your script:
    while (<>) { unless ($n{$_}++) { ... do something whith the line ... } }

    ---- amphiplex
Re: Removing Duplicate Lines from a File
by bronto (Priest) on Jul 17, 2002 at 16:01 UTC

    uniq just removes contriguous duplicated lines. The two lines

    B1 name;B1 Bmk;B1 host;B1 arch;B1 mem;B1 dvr;B1 arw;

    won't be removed by it. Instead, use DamnDirtyApe's one-liner, that's the full program that does what you need! You don't need to incorporate anything else

    Ciao!
    --bronto

    # Another Perl edition of a song:
    # The End, by The Beatles
    END {
      $you->take($love) eq $you->made($love) ;
    }

      How about this hack? This way it still maintains it's some what sorted order. Tested on Win32
      #!/usr/bin/perl -w use strict; my (@array, %hash); while (<DATA>) { push (@array, $_) unless (defined($hash{$_})); $hash{$_} = 1; } print join("", @array); __DATA__ B1 name;B1 Bmk;B1 host;B1 arch;B1 mem;B1 dvr;B1 arw; SomeName;Blondie;server15;unix;1;1;32 SomeName;Blondie;server15;unix;2;2;43 SomeName;Blondie;server15;unix;3;3;54 SomeName;Blondie;server15;unix;4;4;65: B1 name;B1 Bmk;B1 host;B1 arch;B1 mem;B1 dvr;B1 arw; SomeName;Blondie;server12;unix;1;5;76 SomeName;Blondie;server12;unix;2;6;87 SomeName;Blondie;server12;unix;3;7;89 SomeName;Blondie;server12;unix;4;8;09;