http://qs1969.pair.com?node_id=837122


in reply to Re: Pulling out data from one file thats not in another
in thread Pulling out data from one file thats not in another

I tried 'diff' -the trouble there is that the items in the two files don't always appear in the same order. It simply doesn't work. I'll take a peak at the links you suggested though.
  • Comment on Re^2: Pulling out data from one file thats not in another

Replies are listed 'Best First'.
Re^3: Pulling out data from one file thats not in another
by kennethk (Abbot) on Apr 27, 2010 at 15:14 UTC
    By using a hash as per the FAQ, the intersection/difference calculation will be order-independent. You will have to compare the resulting hash (called %count in the FAQ) against a given file's content to determine which file lacked the line in question. Note that the FAQ's code fails if either array has repeat entries.

    Alternatively, you can use bit operations rather than simple incrementation to encode a little extra info. The FAQ code structure is more immediately obvious, but this may do more of what you want:

    #!/usr/bin/perl use strict; use warnings; my $master = shift; my $completed = shift; open my $mh, '<', $master or die "Open fail on $master: $!"; my @master_lines = <$mh>; chomp @master_lines; open my $ch, '<', $completed or die "Open fail on $completed: $!"; my @completed_lines = <$ch>; chomp @completed_lines; my %count; for my $element (@master_lines) { $count{$element}|=1; } for my $element (@completed_lines) { $count{$element}|=2; } print "$master only:\n"; for my $element (@master_lines) { next if $count{$element} & 2; print "$element\n"; } print "$completed only:\n"; for my $element (@completed_lines) { next if $count{$element} & 1; print "$element\n"; }
Re^3: Pulling out data from one file thats not in another
by rubasov (Friar) on Apr 27, 2010 at 15:32 UTC

    There are already several tools to achieve what you want, writing your own is probably needless.

    A standard Unix-like solution (works under bash):
    $ diff <( sort master ) <( sort completed ) | grep '^<' | cut -d ' ' - +f2-

    Depending on your needs you may want to use sort -u instead of a simple sort.

    Or if you're under some Debian-derivative distro just install the moreutils package and use combine:

    $ combine master not completed 1ao8A 1jkxA 1juvA 1mejA 1meoA 1n0uA 1pjqA

    Hope that helps.

      I often use comm instead of diff, too.