Re^2: Pulling out data from one file thats not in another

http://qs1969.pair.com?node_id=837122

in reply to Re: Pulling out data from one file thats not in another
in thread Pulling out data from one file thats not in another

I tried 'diff' -the trouble there is that the items in the two files don't always appear in the same order. It simply doesn't work. I'll take a peak at the links you suggested though.

Comment on Re^2: Pulling out data from one file thats not in another

Replies are listed 'Best First'.
Re^3: Pulling out data from one file thats not in another by kennethk (Abbot) on Apr 27, 2010 at 15:14 UTC
By using a hash as per the FAQ, the intersection/difference calculation will be order-independent. You will have to compare the resulting hash (called `%count` in the FAQ) against a given file's content to determine which file lacked the line in question. Note that the FAQ's code fails if either array has repeat entries. Alternatively, you can use bit operations rather than simple incrementation to encode a little extra info. The FAQ code structure is more immediately obvious, but this may do more of what you want: #!/usr/bin/perl use strict; use warnings; my $master = shift; my $completed = shift; open my $mh, '<', $master or die "Open fail on $master: $!"; my @master_lines = <$mh>; chomp @master_lines; open my $ch, '<', $completed or die "Open fail on $completed: $!"; my @completed_lines = <$ch>; chomp @completed_lines; my %count; for my $element (@master_lines) { $count{$element}\|=1; } for my $element (@completed_lines) { $count{$element}\|=2; } print "$master only:\n"; for my $element (@master_lines) { next if $count{$element} & 2; print "$element\n"; } print "$completed only:\n"; for my $element (@completed_lines) { next if $count{$element} & 1; print "$element\n"; } [download]	[reply] [d/l] [select]
Re^3: Pulling out data from one file thats not in another by rubasov (Friar) on Apr 27, 2010 at 15:32 UTC
There are already several tools to achieve what you want, writing your own is probably needless. A standard Unix-like solution (works under bash): `$ diff <( sort master ) <( sort completed ) \| grep '^<' \| cut -d ' ' - +f2-` [download] Depending on your needs you may want to use `sort -u` instead of a simple `sort`. Or if you're under some Debian-derivative distro just install the `moreutils` package and use `combine`: `$ combine master not completed 1ao8A 1jkxA 1juvA 1mejA 1meoA 1n0uA 1pjqA` [download] Hope that helps.	[reply] [d/l] [select]
Re^4: Pulling out data from one file thats not in another by choroba (Cardinal) on Apr 27, 2010 at 15:53 UTC
I often use `comm` instead of `diff`, too.	[reply] [d/l] [select]

In Section Seekers of Perl Wisdom