WWq has asked for the wisdom of the Perl Monks concerning the following question:

I have two files here (file 1 & file 2). I would like to match names from both files (e.g John06/ext, lily099/poli). However I need to print those unmatched data in file 1 format. I have been trying the code below but it is not the result I want. How to print those unmatched data in file 1 format after matching?

file 1

ID alan135/xkr $work(b05bfn00un0c3)/b05bfn00un0c3 ; #<= b05bfn00un0d0 Size:5848.270996

ID John06/ext $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY

ID lily099/poli $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY

ID sam012/pp $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY

ID lily099/poli $wwrk(b05bfn00ld0p8)/b05bfn00ld0p8 ; #<= b05bfn00ld0s0 Size:INFINITY

ID Steve9018 $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY

..

..

.

file 2

Accept => John06/ext Max

Accept => vivian788/ppr Maxcap

Accept => suzan645/pp Min

Accept => lily099/poli Max

Accept => Nick5670/uu Max

Accept => Anne309/pej Min

..

..

.

code my ($line1,$line2,@arr1,@arr2,@arr3,@emptyarr); @arr1 = <FILE1>; @arr2 = <FILE2>; foreach $line2 (@arr2) { if ($line2 =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)/) { @arr3 = @emptyarr; my $cname2 = "$2"; push (@arr3, $cname2); } } foreach $line2 (@arr3) { foreach $line1 (@arr1) { if ($line1 =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/ +) { my $cname1 = "$2"; if ($cname1 ne $line3) { print NL "$cname1\n"; } } } }

expected result:

ID alan135/xkr $work(b05bfn00un0c3)/b05bfn00un0c3 ; #<= b05bfn00un0d0 Size:5848.270996

ID sam012/pp $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY

ID Steve9018 $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY

Replies are listed 'Best First'.
Re: Perl: How to print unmatched data after comparison of two files?
by Athanasius (Archbishop) on Jul 17, 2013 at 03:51 UTC

    Hello WWq, and welcome to the Monastery!

    To expand on the reply from Loops:

    There are a number of problems with the code as posted. First, it is incomplete: FILE1, FILE2, NL, and especially $line3 are used without being declared or initialised. Do you use strict and use warnings at the top of your code?

    Second, the array @arr3 is cleared (by being reset to @emptyarr) each time the condition evaluates to true within the first foreach loop. So, after leaving that loop, @arr3 can never contain more than one element. That is, the second foreach loop can have at most one iteration.

    Now to two matters of style: (1) Variables should be declared as close as possible to the point of first use. In particular, your loop variables can safely be declared like this:

    foreach my $line (@arr2) {

    (2) Code should be indented to make it easy to see where each structure begins and ends. For example:

    foreach my $line2 (@arr3) { foreach my $line1 (@arr1) { if ($line1 =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*) +\s+(.*)/) { my $cname1 = "$2"; if ($cname1 ne $line3) { print "$cname1\n"; } } } }

    It would also help the monks if you placed your data (the contents of files 1 and 2) within <code> tags, as you’ve done with the code.

    To answer your problem, we will need to know what $line3 is supposed to contain.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Perl: How to print unmatched data after comparison of two files?
by CountZero (Bishop) on Jul 17, 2013 at 06:07 UTC
    The traditional way to do this is by putting the keys you have to check repeatedly in a hash-variable which is very fast to look-up, even when there are many items to look-up. Then you can read the other file line by line and check the keys.

    Rather than using a regex to extract the keys, the following example uses split

    use Modern::Perl; open my $NAMES, '<', './file2.txt' or die 'Could not open file2.txt'; my %names; while (<$NAMES>) { chomp; next unless $_; $names{(split)[2]}=1; } close $NAMES; open my $DATA, '<', './file1.txt' or die 'Could not open file1.txt'; while (<$DATA>) { chomp; next unless $_; say unless $names{(split)[1]}; } close $DATA;
    Output:
    ID alan135/xkr $work(b05bfn00un0c3)/b05bfn00un0c3 ; #<= b05bfn00un0d0 +Size:5848.270996 ID sam012/pp $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Si +ze:INFINITY ID Steve9018 $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Si +ze:INFINITY

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Perl: How to print unmatched data after comparison of two files?
by Loops (Curate) on Jul 17, 2013 at 03:40 UTC

    Unfortunately some problems have slipped through because your code doesn't use strict; If you did that you'd see that the $line3 variable is undefined, and others lack lexical scoping.

    Update:

    Felt a bit guilty for the brevity of my original answer, so here is a quick example of code that works to produce the output you were expecting. It may need some tweaking as you see fit

    use strict; use warnings; # Load the acceptable ids from file2 into a hash my %ids; open my $idfile, '<', 'file2' or die $!; for (<$idfile>) { $ids{$1} = 1 if m/Accept\s+=>\s+(\S+)/; } close $idfile; # Scan file1 printing each line where the ID matches one found in our +hash open my $datafile, '<', 'file1' or die $!; for (<$datafile>) { chomp; my ($id) = m/ID\s+(\S+)/; print "$_\n" if exists $ids{$id}; } close $datafile;

    Output:

    ID John06/ext $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 S +ize:INFINITY ID lily099/poli $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 + Size:INFINITY ID lily099/poli $wwrk(b05bfn00ld0p8)/b05bfn00ld0p8 ; #<= b05bfn00ld0s0 + Size:INFINITY
      Hi Loops, Thanks! =) I am sorry for my mistake. I updated the expected result. I would like to print out those unmatched data. How to print after name matching?

        No worries, you just need to reverse the test that decides which lines to print. One way is to change the if statement to an "unless" statement.

        print "$_\n" unless exists $ids{$id};
Re: Perl: How to print unmatched data after comparison of two files?
by 2teez (Vicar) on Jul 17, 2013 at 05:28 UTC

    You could use a hash to get from file2, the files parameter, you want and using a while loop, check if these files parameter gotten from file2 exists in file1. if so print the whole line of file1.
    Like this:

    use warnings; use strict; use Inline::Files; my %hash; while (<DATA2>) { next if /^\s*$/; $hash{ ( split /\s+?/, $_ )[2] } = undef; } while (<DATA1>) { next if /^\s*$/; if (/^.+?\s(.+?)\s/) { print $_ if exists $hash{$1}; } } __DATA1__ ID alan135 /xkr $work(b05bfn00un0c3) / b05bfn00un0c3; #<= b05bfn00un +0d0 Size:5848.270996 ID John06 /ext $work(b05bfn00ld0p7) / b05bfn00ld0p7; #<= b05bfn00ld +0s0 Size:INFINITY ID lily099 /poli $work(b05bfn00ld0p7) / b05bfn00ld0p7; #<= b05bfn00 +ld0s0 Size:INFINITY ID sam012 /pp $work(b05bfn00ld0p7) / b05bfn00ld0p7; #<= b05bfn00l +d0s0 Size:INFINITY ID lily099 /poli $wwrk(b05bfn00ld0p8) / b05bfn00ld0p8; #<= b05bfn00 +ld0s0 Size:INFINITY ID Steve9018 $work(b05bfn00ld0p7) /b05bfn00ld0p7; #<= b05bfn00 +ld0s0 Size:INFINITY __DATA2__ Accept => John06 / ext Max Accept => vivian788 / ppr Maxcap Accept => suzan645 / pp Min Accept => lily099 / poli Max Accept => Nick5670 / uu Max Accept => Anne309 / pej Min

    Update:
    Awhoosh!!!, Loops, has similar solution in his Update.
    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
      Hi, Thanks for your code. I applied your code but It prints out matched data. I only need to print those unmatched data shown as expected result. How can I alter the code? Could you please explain further about hash?
        ..I only need to print those unmatched data shown as expected result..
        use negation not like so:
        ... while (<DATA1>) { next if /^\s*$/; if (/^.+?\s(.+?)\s/) { print $_ if not exists $hash{$1}; # note HERE } } ..

        Could you please explain further about hash?
        See perlintro
        If you tell me, I'll forget.
        If you show me, I'll remember.
        if you involve me, I'll understand.
        --- Author unknown to me