Re^2: Pulling out data from one file thats not in another

Replies are listed 'Best First'.
Re^3: Pulling out data from one file thats not in another by kennethk (Abbot) on Apr 27, 2010 at 15:14 UTC
By using a hash as per the FAQ, the intersection/difference calculation will be order-independent. You will have to compare the resulting hash (called `%count` in the FAQ) against a given file's content to determine which file lacked the line in question. Note that the FAQ's code fails if either array has repeat entries. Alternatively, you can use bit operations rather than simple incrementation to encode a little extra info. The FAQ code structure is more immediately obvious, but this may do more of what you want: #!/usr/bin/perl use strict; use warnings; my $master = shift; my $completed = shift; open my $mh, '<', $master or die "Open fail on $master: $!"; my @master_lines = <$mh>; chomp @master_lines; open my $ch, '<', $completed or die "Open fail on $completed: $!"; my @completed_lines = <$ch>; chomp @completed_lines; my %count; for my $element (@master_lines) { $count{$element}\|=1; } for my $element (@completed_lines) { $count{$element}\|=2; } print "$master only:\n"; for my $element (@master_lines) { next if $count{$element} & 2; print "$element\n"; } print "$completed only:\n"; for my $element (@completed_lines) { next if $count{$element} & 1; print "$element\n"; } [download]	[reply] [d/l] [select]
Re^3: Pulling out data from one file thats not in another by rubasov (Friar) on Apr 27, 2010 at 15:32 UTC
There are already several tools to achieve what you want, writing your own is probably needless. A standard Unix-like solution (works under bash): `$ diff <( sort master ) <( sort completed ) \| grep '^<' \| cut -d ' ' - +f2-` [download] Depending on your needs you may want to use `sort -u` instead of a simple `sort`. Or if you're under some Debian-derivative distro just install the `moreutils` package and use `combine`: `$ combine master not completed 1ao8A 1jkxA 1juvA 1mejA 1meoA 1n0uA 1pjqA` [download] Hope that helps.	[reply] [d/l] [select]
Re^4: Pulling out data from one file thats not in another by choroba (Cardinal) on Apr 27, 2010 at 15:53 UTC
I often use `comm` instead of `diff`, too.	[reply] [d/l] [select]


Your skill will accomplish what the force of many cannot
	PerlMonks