Removing Duplicate Lines from a File

johnirl has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks
I am attempting to do remove duplicated lines from a comma delimited file. To the human eye the files appear the same but for some reason the uniq command doesn't see them as duplicates. I was hoping to be able to use perl to remove them.

Below is the file followed by the file after being run through od -ta

B1 name;B1 Bmk;B1 host;B1 arch;B1 mem;B1 dvr;B1 arw;
SomeName;Blondie;server15;unix;1;1;32
SomeName;Blondie;server15;unix;2;2;43
SomeName;Blondie;server15;unix;3;3;54
SomeName;Blondie;server15;unix;4;4;65:
B1 name;B1 Bmk;B1 host;B1 arch;B1 mem;B1 dvr;B1 arw;
SomeName;Blondie;server12;unix;1;5;76
SomeName;Blondie;server12;unix;2;6;87
SomeName;Blondie;server12;unix;3;7;89
SomeName;Blondie;server12;unix;4;8;09;
[download]

0000000   B   1  sp   n   a   m   e   ;   B   1  sp   B   m   k   ;   
+B
0000020   1  sp   h   o   s   t   ;   B   1  sp   a   r   c   h   ;   
+B
0000040   1  sp   m   e   m   ;   B   1  sp   d   v   r   ;   B   1  s
+p
0000060   a   r   w   ;  nl   S   o   m   e   N   a   m   e   ;   B   
+l
0000100   o   n   d   i   e   ;   s   e   r   v   e   r   1   5   ;   
+u
0000120   n   i   x   ;   1   ;   1   ;   3   2  nl   S   o   m   e   
+N
0000140   a   m   e   ;   B   l   o   n   d   i   e   ;   s   e   r   
+v
0000160   e   r   1   5   ;   u   n   i   x   ;   2   ;   2   ;   4   
+3
0000200  nl   S   o   m   e   N   a   m   e   ;   B   l   o   n   d   
+i
0000220   e   ;   s   e   r   v   e   r   1   5   ;   u   n   i   x   
+;
0000240   3   ;   3   ;   5   4  nl   S   o   m   e   N   a   m   e   
+;
0000260   B   l   o   n   d   i   e   ;   s   e   r   v   e   r   1   
+5
0000300   ;   u   n   i   x   ;   4   ;   4   ;   6   5   :  nl   B   
+1
0000320  sp   n   a   m   e   ;   B   1  sp   B   m   k   ;   B   1  s
+p
0000340   h   o   s   t   ;   B   1  sp   a   r   c   h   ;   B   1  s
+p
0000360   m   e   m   ;   B   1  sp   d   v   r   ;   B   1  sp   a   
+r
0000400   w   ;  nl   S   o   m   e   N   a   m   e   ;   B   l   o   
+n
0000420   d   i   e   ;   s   e   r   v   e   r   1   2   ;   u   n   
+i
0000440   x   ;   1   ;   5   ;   7   6  nl   S   o   m   e   N   a   
+m
0000460   e   ;   B   l   o   n   d   i   e   ;   s   e   r   v   e   
+r
0000500   1   2   ;   u   n   i   x   ;   2   ;   6   ;   8   7  nl   
+S
0000520   o   m   e   N   a   m   e   ;   B   l   o   n   d   i   e   
+;
0000540   s   e   r   v   e   r   1   2   ;   u   n   i   x   ;   3   
+;
0000560   7   ;   8   9  nl   S   o   m   e   N   a   m   e   ;   B   
+l
0000600   o   n   d   i   e   ;   s   e   r   v   e   r   1   2   ;   
+u
0000620   n   i   x   ;   4   ;   8   ;   0   9   ;  nl
0000634
[download]

Comment on Removing Duplicate Lines from a File Select or Download Code

Replies are listed 'Best First'.
Re: Removing Duplicate Lines from a File by DamnDirtyApe (Curate) on Jul 17, 2002 at 15:31 UTC
This should only print each line in `my_file.txt` once: `perl -ne 'print unless $n{$_}++' my_file.txt` [download] _______________ D a m n D i r t y A p e Home Node \| Email	[reply] [d/l] [select]
Re: Re: Removing Duplicate Lines from a File by johnirl (Monk) on Jul 17, 2002 at 15:35 UTC
But how do I embed that into a perl program?	[reply]
(jeffa) 3Re: Removing Duplicate Lines from a File by jeffa (Bishop) on Jul 17, 2002 at 15:44 UTC
If you run DamnDirtyApe's excellent one-liner through Deparse: `perl -MO=Deparse -ne 'print unless $n{$_}++' my_file.txt` [download] you will see how: `LINE: while (defined($_ = <ARGV>)) { print $_ unless $n{$_}++; }` [download] You can remove the LINE: label and you should add use strict. jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l] [select]
Re: Re: Re: Removing Duplicate Lines from a File by broquaint (Abbot) on Jul 17, 2002 at 15:44 UTC
`my %uniq; $uniq{$_}++ while <DATA>; print for keys %uniq; __DATA__ B1 name;B1 Bmk;B1 host;B1 arch;B1 mem;B1 dvr;B1 arw; SomeName;Blondie;server15;unix;1;1;32 SomeName;Blondie;server15;unix;2;2;43 SomeName;Blondie;server15;unix;3;3;54 SomeName;Blondie;server15;unix;4;4;65: B1 name;B1 Bmk;B1 host;B1 arch;B1 mem;B1 dvr;B1 arw; SomeName;Blondie;server12;unix;1;5;76 SomeName;Blondie;server12;unix;2;6;87 SomeName;Blondie;server12;unix;3;7;89 SomeName;Blondie;server12;unix;4;8;09;` [download] HTH `_________ broquaint`	[reply] [d/l]
Re: Removing Duplicate Lines from a File by amphiplex (Monk) on Jul 17, 2002 at 15:44 UTC
Hi ! The `uniq` command expects sorted input, so you should do `sort file \| uniq`, or, better: `sort -u file` as to how to use DamnDirtyApes answer in your script: `while (<>) { unless ($n{$_}++) { ... do something whith the line ... } }` [download] ---- amphiplex	[reply] [d/l]
Re: Removing Duplicate Lines from a File by bronto (Priest) on Jul 17, 2002 at 16:01 UTC
`uniq` just removes contriguous duplicated lines. The two lines `B1 name;B1 Bmk;B1 host;B1 arch;B1 mem;B1 dvr;B1 arw;` won't be removed by it. Instead, use DamnDirtyApe's one-liner, that's the full program that does what you need! You don't need to incorporate anything else Ciao! `--bronto` # Another Perl edition of a song: # The End, by The Beatles END { $you->take($love) eq $you->made($love) ; }	[reply] [d/l] [select]
Re: Re: Removing Duplicate Lines from a File by softworkz (Monk) on Jul 18, 2002 at 14:59 UTC
How about this hack? This way it still maintains it's some what sorted order. Tested on Win32 #!/usr/bin/perl -w use strict; my (@array, %hash); while (<DATA>) { push (@array, $_) unless (defined($hash{$_})); $hash{$_} = 1; } print join("", @array); __DATA__ B1 name;B1 Bmk;B1 host;B1 arch;B1 mem;B1 dvr;B1 arw; SomeName;Blondie;server15;unix;1;1;32 SomeName;Blondie;server15;unix;2;2;43 SomeName;Blondie;server15;unix;3;3;54 SomeName;Blondie;server15;unix;4;4;65: B1 name;B1 Bmk;B1 host;B1 arch;B1 mem;B1 dvr;B1 arw; SomeName;Blondie;server12;unix;1;5;76 SomeName;Blondie;server12;unix;2;6;87 SomeName;Blondie;server12;unix;3;7;89 SomeName;Blondie;server12;unix;4;8;09; [download]	[reply] [d/l]