Finding Mismatches

Abhisek has asked for the wisdom of the Perl Monks concerning the following question:

 /usr/bin/perl -w
for (my $x1 = 0; $x1 <= $#RAM1; $x1++) {
   for (my $y1 = 0; $y1 <= $#{$RAM1[$x1]}; $y1++) {
     if(($RAM[$x1][$y1]) eq ($RAM1[$x1][$y1]))
        {
         if($x1==0){
        }
        #print "DO NOTHING\n";
         }
     else{
        push(@shyam,$y1);
what would get stored in @shyam in this code. Does $y1 tell you about 
+the column that does not match in this particular code. 

        print "Mismatches \$y1=$y1 and $RAM[$x1][$y1] and $RAM1[$x1][$
+y1]\n";
        my $lenfirst=length($AoA[$x1][$y1]);
        $lenfinal=35-$lenfirst;
        my $spacelen=" " x "$lenfinal";
        push(@inde,{"$y1" => "$RAM[$x1][$y1]"."$spacelen"."\|$RAM1[$x1
+][$y1]"});
         }
   }
}
[download]

Comment on Finding Mismatches Download Code

Replies are listed 'Best First'.
Re: Finding Mismatches by graff (Chancellor) on Feb 12, 2008 at 08:24 UTC
The script should start like this: `#!/usr/bin/perl -w use strict;` [download] (note the initial "#!") You have a `$AoA[$x1][$y1]` in there, but I think this should either be "RAM" or "RAM1" instead of "AoA". Pushing values of $y1 onto a @shyam array seems pointless -- if you want to keep track of where the mismatches are, you probably want to save $x1 as well as $y1. You seem to be building an array of hashes in @inde, but each hash only has one key/value pair, with "$y1" as the hash key. Are you planning on adding more key/value things later? If not, you might as well be pushing plain old strings onto a plain old array. (Maybe you really want "inde" to be a HoH, keyed by $x1 and $y1 ?) It's not clear what sort of result you really want as your output, and you didn't provide any sample input, so I'm not sure what else to say about the code, except that it could be written in a way that would be easier to read (and you can probably use sprintf to get the string format that you want for @inde): `# ... after declaring and loading values into @RAM and @RAM1... my %inde; my $rownum = 0; for my $row ( @RAM ) { my $row1 = $RAM1[$rownum]; my $colnum = 0; for my $col ( @$row ) { if ( $col ne $row1->[$colnum] ) { printf( "Mismatch at row %d col %d: RAM=%s vs. RAM1=%s\n" +, $rownum, $colnum, $col, $row1->[$colnum] ); $inde{$rownum}{$colnum} = sprintf( "%17s \| %-17s", $col, $ +row1->[$colnum] ); } $colnum++; } $rownum++; } # do something with %inde...` [download] If I were looking for diffs between a couple of 2-D arrays, that sort of how I would do it, I think.	[reply] [d/l] [select]
Re: Finding Mismatches by hipowls (Curate) on Feb 12, 2008 at 08:29 UTC
You will be best off doing the compare before the split but assuming you have been given the arrays and don't have access to the original data. How large are these arrays? A naive comparison will take time proportional to N**2, I'd be inclined to convert the data into a more usable form by hash function to generate a key for each row. `my %lookup_row; foreach my $row ( @RAM ) { my $hash = hash($row); if ( defined $lookup_row{$hash} ) { push @{ $lookup_row{$hash} }, $row; } else { $lookup_row{ hash($row) } = [$row]; } } foreach my $row ( @RAM2 ) { if ( exists $lookup_row{ hash($row) } ) { # potential match # check for collision } } sub hash { # something that makes sense for your data # probably join in this case. }` [download] For large data sets (for some definition of large) this will run faster.	[reply] [d/l]