WWq has asked for the wisdom of the Perl Monks concerning the following question:

I have two files here (file 1 & file 2). I would like to match names from both files (e.g John06/ext, lily099/poli). However I need to print those unmatched data in file 1 format. I have been trying the code below but it is not the result I want. How to print those unmatched data in file 1 format after matching?

file 1

ID alan135/xkr $work(b05bfn00un0c3)/b05bfn00un0c3 ; #<= b05bfn00un0d0 Size:5848.270996

ID John06/ext $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY

ID lily099/poli $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY

ID sam012/pp $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY

ID lily099/poli $wwrk(b05bfn00ld0p8)/b05bfn00ld0p8 ; #<= b05bfn00ld0s0 Size:INFINITY

ID Steve9018 $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY

..

..

.

file 2

Accept => John06/ext Max

Accept => vivian788/ppr Maxcap

Accept => suzan645/pp Min

Accept => lily099/poli Max

Accept => Nick5670/uu Max

Accept => Anne309/pej Min

..

..

.

code my ($line1,$line2,@arr1,@arr2,@arr3,@emptyarr); @arr1 = <FILE1>; @arr2 = <FILE2>; foreach $line2 (@arr2) { if ($line2 =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)/) { @arr3 = @emptyarr; my $cname2 = "$2"; push (@arr3, $cname2); } } foreach $line2 (@arr3) { foreach $line1 (@arr1) { if ($line1 =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/ +) { my $cname1 = "$2"; if ($cname1 ne $line3) { print NL "$cname1\n"; } } } }

expected result:

ID John06/ext $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY

ID lily099/poli $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY

ID lily099/poli $wwrk(b05bfn00ld0p8)/b05bfn00ld0p8 ; #<= b05bfn00ld0s0 Size:INFINITY

Replies are listed 'Best First'.
Re: Perl: How to print unmatched data after comparison of two files?
by CountZero (Bishop) on Jul 17, 2013 at 06:02 UTC
    The traditional way to do this is by putting the keys you have to check repeatedly in a hash-variable which is very fast to look-up, even when there are many items to look-up. Then you can read the other file line by line and check the keys. For example:
    use Modern::Perl; open my $NAMES, '<', './file2.txt' or die 'Could not open file2.txt'; my %names; while (<$NAMES>) { chomp; next unless $_; $names{(split)[2]}=1; } close $NAMES; open my $DATA, '<', './file1.txt' or die 'Could not open file1.txt'; while (<$DATA>) { chomp; next unless $_; say unless $names{(split)[1]}; } close $DATA;
    Output:
    ID alan135/xkr $work(b05bfn00un0c3)/b05bfn00un0c3 ; #<= b05bfn00un0d0 +Size:5848.270996 ID sam012/pp $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Si +ze:INFINITY ID Steve9018 $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Si +ze:INFINITY

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Perl: How to print unmatched data after comparison of two files?
by davido (Cardinal) on Jul 17, 2013 at 05:53 UTC

    Where does $line3 get assigned? It's tested for inequality on line 27 of the code you posted, but I don't see where it's being populated.

    use strict 'vars'; would have caught that and turned it into a compiletime error.


    Dave

Re: Perl: How to print unmatched data after comparison of two files?
by mtmcc (Hermit) on Jul 17, 2013 at 07:21 UTC
    If you were forced to look at the files in the order file 1, file 2, you could store the file 1 data in an hash of arrays, with names as keys. Something like this:

    #!/usr/bin/perl use strict; use warnings; my $fileNameA = $ARGV[0]; my $fileNameB = $ARGV[1]; my %outputHash = (); my @line; my $x = 0; open (my $inputA, "<", $fileNameA); while (<$inputA>) { next unless $_ =~ m/\w/; @line = split(" ", $_); push (@{$outputHash{$line[1]}}, $_); } open (my $inputB, "<", $fileNameB); while (<$inputB>) { next unless $_ =~ m/\w/; @line = split(" ", $_); if (exists ${$outputHash{$line[2]}}[0]) { for ($x = 0; $x < @{$outputHash{$line[2]}}; $x += 1) { print STDERR "${$outputHash{$line[2]}}[$x]"; } } }

    But this would only work if the names in file 2 are unique. And it's more efficient to do it the opposite way around if possible, as suggested above.

    -Michael