Re: removing non-duplicates

Replies are listed 'Best First'.
Re^2: removing non-duplicates by anonymized user 468275 (Curate) on Jul 12, 2005 at 09:30 UTC
although sort file \| uniq works; why not just use... `sort -u` [download] One world, one people	[reply] [d/l]
Re^3: removing non-duplicates by davidrw (Prior) on Jul 12, 2005 at 12:53 UTC
beacause that doesn't do what OP wanted. Quotes from the respective man pages: sort: -u does output only the first of an equal run (which means all distinct rows) uniq: -u does only print unique lines (which means all rows that appear exactly once) `[me@host tmp]$ cat /tmp/t A B A C [me@host tmp]$ sort -u /tmp/t A B C [me@host tmp]$ sort /tmp/t \| uniq -u B C` [download]	[reply] [d/l]
Re^2: removing non-duplicates by Roy Johnson (Monsignor) on Jul 11, 2005 at 21:59 UTC
That gives you one copy of each distinct line. The OP wanted only the lines that appear exactly once. It could be done with uniq -c and grep and cut, but it gets to the point that you just want to do it in Perl. Gar. Should have double-checked that -u option. Good answer. Caution: Contents may have been coded under pressure.	[reply]
Re^3: removing non-duplicates by davidrw (Prior) on Jul 11, 2005 at 22:35 UTC
heh. score one more for the nix cmdline utils :) Just for sake of argument/exercise, even if there was only `-c` i would still do it on the cmdline (also, these are handy if you want lines that show up N times since `-u` only helps if N==1): `# using perl: uniq -c /tmp/d \| perl -ne '($n,$s)=split(/\t/,$_,2); print $s if $n == + 1' # using grep/cut (make sure that's a real tab after the 1 in the grep) uniq -c /tmp/d \| egrep '^ 1 ' \| cut -d\t -f2` [download]	[reply] [d/l] [select]