in reply to removing non-duplicates

why didn't unix uniq work? I believe that uniq -u file.txt is what you're looking for. note that the file needs to be sorted first:
sort file.txt | uniq -u > unique_lines.txt

Replies are listed 'Best First'.
Re^2: removing non-duplicates
by anonymized user 468275 (Curate) on Jul 12, 2005 at 09:30 UTC
    although sort file | uniq works; why not just use...
    sort -u

    One world, one people

      beacause that doesn't do what OP wanted. Quotes from the respective man pages:
      • sort: -u does output only the first of an equal run (which means all distinct rows)
      • uniq: -u does only print unique lines (which means all rows that appear exactly once)
      [me@host tmp]$ cat /tmp/t A B A C [me@host tmp]$ sort -u /tmp/t A B C [me@host tmp]$ sort /tmp/t | uniq -u B C
Re^2: removing non-duplicates
by Roy Johnson (Monsignor) on Jul 11, 2005 at 21:59 UTC
    That gives you one copy of each distinct line. The OP wanted only the lines that appear exactly once. It could be done with uniq -c and grep and cut, but it gets to the point that you just want to do it in Perl. Gar. Should have double-checked that -u option. Good answer.

    Caution: Contents may have been coded under pressure.
      heh. score one more for the *nix cmdline utils :)

      Just for sake of argument/exercise, even if there was only -c i would still do it on the cmdline (also, these are handy if you want lines that show up N times since -u only helps if N==1):
      # using perl: uniq -c /tmp/d | perl -ne '($n,$s)=split(/\t/,$_,2); print $s if $n == + 1' # using grep/cut (make sure that's a real tab after the 1 in the grep) uniq -c /tmp/d | egrep '^ *1 ' | cut -d\t -f2