Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Pulling out data from one file thats not in another

by kennethk (Abbot)
on Apr 27, 2010 at 14:55 UTC ( [id://837120]=note: print w/replies, xml ) Need Help??


in reply to Pulling out data from one file thats not in another

The simple answer for your problem would appear to be diff, a standard *nix command and available as a GUI in Windows as WinDiff. Is there a reason you cannot use these rather than reinventing the wheel? If you need to do it in Perl, this is a FAQ: How do I compute the difference of two arrays? How do I compute the intersection of two arrays?

Replies are listed 'Best First'.
Re^2: Pulling out data from one file thats not in another
by Angharad (Pilgrim) on Apr 27, 2010 at 14:59 UTC
    I tried 'diff' -the trouble there is that the items in the two files don't always appear in the same order. It simply doesn't work. I'll take a peak at the links you suggested though.
      By using a hash as per the FAQ, the intersection/difference calculation will be order-independent. You will have to compare the resulting hash (called %count in the FAQ) against a given file's content to determine which file lacked the line in question. Note that the FAQ's code fails if either array has repeat entries.

      Alternatively, you can use bit operations rather than simple incrementation to encode a little extra info. The FAQ code structure is more immediately obvious, but this may do more of what you want:

      #!/usr/bin/perl use strict; use warnings; my $master = shift; my $completed = shift; open my $mh, '<', $master or die "Open fail on $master: $!"; my @master_lines = <$mh>; chomp @master_lines; open my $ch, '<', $completed or die "Open fail on $completed: $!"; my @completed_lines = <$ch>; chomp @completed_lines; my %count; for my $element (@master_lines) { $count{$element}|=1; } for my $element (@completed_lines) { $count{$element}|=2; } print "$master only:\n"; for my $element (@master_lines) { next if $count{$element} & 2; print "$element\n"; } print "$completed only:\n"; for my $element (@completed_lines) { next if $count{$element} & 1; print "$element\n"; }

      There are already several tools to achieve what you want, writing your own is probably needless.

      A standard Unix-like solution (works under bash):
      $ diff <( sort master ) <( sort completed ) | grep '^<' | cut -d ' ' - +f2-

      Depending on your needs you may want to use sort -u instead of a simple sort.

      Or if you're under some Debian-derivative distro just install the moreutils package and use combine:

      $ combine master not completed 1ao8A 1jkxA 1juvA 1mejA 1meoA 1n0uA 1pjqA

      Hope that helps.

        I often use comm instead of diff, too.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://837120]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2024-03-29 07:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found