in reply to Column Comparison of a File in Perl

Problem #1

This code will read your input file 1 line at a time, and filter out just the lines you require, without reading the whole file into memory at once. You would need to redirect the output to a file.

use strict; use warnings; my $flag = 0; while (<DATA>) { if (/^A id/) { print; $flag = 1; } print if $flag and /^M/ } __DATA__ A class 1 1 1 1 1 1 A id 12 12 15 15 16 16 B class 0 0 0 0 0 0 B id 0 0 0 0 0 0 0 C id 0 0 0 0 0 0 0 M X1 1 2 2 2 2 2 M X2 2 1 1 1 1 1 M X3 1 2 2 2 2 2 M X4 2 2 2 1 1 2 M X5 1 1 1 1 1 1 M X6 1 1 1 2 2 1 M X7 1 1 1 1 1 1 M X8 1 1 1 1 1 1 M X9 1 1 1 1 1 1 M X10 1 1 1 1 1 1 M X11 1 2 1 1 1 1 M X12 2 2 2 2 2 2 M X13 2 1 2 2 2 2 M X14 2 1 2 2 2 2 M X15 1 2 1 1 1 1 M X16 1 1 2 2 2 2 M X17 1 2 2 2 2 2

Problem #2

I'm trying to understand your comparison requirements. Could you elaborate?

Update: Here's a guess...

use strict; use warnings; while (<DATA>) { next if /^A id/; my ($x, $c1, $c2, @cols) = (split)[1..7]; print "$x: col 12, 1st: $c1\n"; for my $col (@cols) { if ($c1 == $col) { print " matches\n"; } else { print " does not match\n"; } } print "$x: col 12, 2nd: $c2\n"; for my $col (@cols) { if ($c2 == $col) { print " matches\n"; } else { print " does not match\n"; } } } __DATA__ A id 12 12 15 15 16 16 M X1 1 2 2 2 2 2 M X2 2 1 1 1 1 1 M X3 1 2 2 2 2 2 M X4 2 2 2 1 1 2
Prints...
X1: col 12, 1st: 1 does not match does not match does not match does not match X1: col 12, 2nd: 2 matches matches matches matches X2: col 12, 1st: 2 does not match does not match does not match does not match X2: col 12, 2nd: 1 matches matches matches matches X3: col 12, 1st: 1 does not match does not match does not match does not match X3: col 12, 2nd: 2 matches matches matches matches X4: col 12, 1st: 2 matches does not match does not match matches X4: col 12, 2nd: 2 matches does not match does not match matches

Replies are listed 'Best First'.
Re^2: Column Comparison of a File in Perl
by snape (Pilgrim) on Jan 19, 2010 at 22:06 UTC
    Thanks a lot fopr the reply and sorry for bad explainations. Since, I have only two values i.e. 1 or 2 in both the columns, I would like to compare the values which are identical in the same position of the row but are different in position of the columns. for eg:
    A id 12(F.C.) 12 (S.C.) 15(F.C.) 15(S.C.) 16(F.C.)16 (S.C.) M X1 1 2 2 2 2 2 M X2 2 1 1 1 1 1 M X3 1 2 2 2 2 2
    It is the subset of the above data, where F.C. represents First Column and S.C. represents Second Column (included for being more descriptive). Here, we see that the second column of 12 is identical to first and second column of 15 and 16. Therefore, I would like to know the longest stretch of the two similar/identical columns. Similarly, I would like to do it for all the other columns. Remember: that I can't compare the first column of 12 with second column of 12.