Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

matching the contents of two (2) files

by stevenrh (Beadle)
on Mar 10, 2005 at 19:10 UTC ( [id://438388]=perlquestion: print w/replies, xml ) Need Help??

stevenrh has asked for the wisdom of the Perl Monks concerning the following question:

Hello All, I am trying to compare two files, and send the difference to another file. I have usernames in the first, and if they don't match the second, i want to print/redirect that username to the third file. So far, I have some pseudo-code. My conflict lies in plain noobness. I am getting really bent on filehandles and arrays/hashes.
clarification
Essentially, I would like the iterpreter to read the first line of FILE1, if its paattern is matched on FILE2, continue to Line 2 of FILE1, and repeat check on FILE2.. Else, print to screen, or to another file. Here is what I have so far:
#!/usr/bin/perl -w use strict; open(FIRST,"first.txt") || die "$!"; @first = <FIRST>; close FIRST; open(LAST,"last.txt") || die "$!"; @last = <LAST>; close LAST; foreach $i (@first){ if ($i =~ m/@last) }else{ print "$_"; }

Thanks in advance for any help

steven

Update:
no, this isn't homework :), a colleague and I are trying to match our userbase with the ones we've downloaded from our spam filter provider. we're trying to match/compare/contrast our username list with theirs (then we can batch upload our unfiltered and angry users' names). the file itself is under 1MB (about 40,000 lines). Diff doesn't work because they don't line up exactly, and grep -v -f etc.. choked on us as well. Perl seemed like my last straw.

UP-Update:

The code below satisfies my requirements for what I have to do. THANKS all!!!
--cheers

Replies are listed 'Best First'.
Re: matching the contents of two (2) files
by dragonchild (Archbishop) on Mar 10, 2005 at 19:18 UTC
    First, what does doing this buy you that using diff doesn't? Use the right tool for the right job.

    Second, You will want to use a hash, not an array. (I'm abbreviating here. Add the excellent fileopen stuff you're doing.)

    chomp( my @first = <FIRST> ); chomp( my @last = <LAST> ); my %last_names; undef @last_names{ @last }; foreach my $name (@first ) { next if exists $last_names{ $name }; print "$name\n"; }

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      If you're going to abbreviate, abbreviate :-)
      my %hash; @hash{<LAST>} = (); # Death to the undef trick! exists $hash{$_} or print while <FIRST>;

      Caution: Contents may have been coded under pressure.
        Let's see ... that's about 70 characters.
        #234567890#234567890#234567890#234567890 -p BEGIN{@x{+pop}=()}exists$x{$_}||next
        That's 39 characters. You have to run it as so from the command line.
        perl -p -e 'BEGIN{@x{+pop}=()}exists$x{$_}||next' FIRST.txt LAST.txt > + MISSING.txt

        This is really untested code, but it _should_ work. :-)

        Being right, does not endow the right to be rude; politeness costs nothing.
        Being unknowing, is not the same as being stupid.
        Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
        Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: matching the contents of two (2) files
by Roy Johnson (Monsignor) on Mar 10, 2005 at 20:59 UTC
Re: matching the contents of two (2) files
by bibo (Pilgrim) on Mar 10, 2005 at 19:21 UTC
    Right off the bat, I hope these aren't big files, as you could have an ugly moment when you slurp them in.

    I hope this isn't some sort of homework problem? Take a look at the grep function to search for the strings...

    cheers

    --b

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://438388]
Approved by RazorbladeBidet
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (7)
As of 2024-03-28 10:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found