OK, thanks for posting some data. In future, please post your data inside <code></code> tags, like code.

The following solution uses some lightweight modules to do a lot of the work. Since you are testing scientific results I thought it would be appropriate to use part of Perl's testing framework. Test::Differences compares two data structures to see if they are identical and reports where they differ, if they are not identical.

This script writes test results (tests are named for the file they are testing) to a composite log file, and also writes test failure diagnostics (a diff of the two files) to an individual log for each file. (The only thing I don't like is that you get left with zero-byte failure logs if there were no failures).

It assumes you want to strip the filenames as shown; change the regexp to suit. It also makes up a directory for the reciprocal files called 'Recip' and for the original blast files called 'Lab' -- change to suit.

#!/usr/bin/perl use strict; use warnings; use File::Find::Rule; use Path::Tiny qw/ path /; use Test::More; use Test::Differences; # log of all tests Test::More->builder->output( 'test_results.txt' ); # Get all the files we want to compare my $rule = File::Find::Rule->new; $rule->file->name('*.Recip.blast.top'); my @files = $rule->in( 'Recip' ); foreach my $rcp_file ( @files ) { # make a new path for the original (lab results) file and # strip the unwanted string from the end of the filename ( my $org_file = $rcp_file ) =~ s/^Recip/Lab/; $org_file =~ s/.Recip.blast.top//; # designate an individual test failure log ( my $err_log = "test_failure.$org_file.txt" ) =~ s/Lab\///; Test::More->builder->failure_output( $err_log ); # Get the content of the two files, extract the wanted strings # to be compared, and store in arrays my @rcp_lines = path( $rcp_file )->lines({ chomp => 1 }); @rcp_lines = map { join(' ', (split '\|')[1,5]) } @rcp_lines; my @org_lines = path( $org_file )->lines({ chomp => 1 }); @org_lines = map { join(' ', (split '\|')[5,1]) } @org_lines; # run the tests eq_or_diff( \@rcp_lines, \@org_lines, $org_file); } done_testing; __END__
My data files for testing:
$ cat Recip/do.re.mi.fa.so.la.ti.1.Recip.blast.top gi|110123922|gb|EC817325.1|EC817325 gi|110095377|gb|EC788780.1|EC78878 +0 gi|110123921|gb|EC817324.1|EC817324 gi|110105430|gb|EC798833.1|EC79883 +3 6 gi|110123920|gb|EC817323.1|EC817323 gi|110106464|gb|EC799867.1|EC79986 +7
$ cat Recip/do.re.mi.fa.so.la.ti.2.Recip.blast.top gi|110123922|gb|EC817325.1|EC817325 gi|110095377|gb|EC788780.1|EC78878 +0 gi|110123921|gb|EC817324.1|EC817324 gi|110105430|gb|EC798833.1|EC79883 +3 6 gi|110123920|gb|EC817323.1|EC817323 gi|110106464|gb|EC799867.1|EC79986 +7
$ cat Lab/do.re.mi.fa.so.la.ti.1 gi|110095377|gb|EC788780.1|EC788780 gi|110123922|gb|EC817325.1|EC817 +325 gi|110105430|gb|EC798833.1|EC798833 6 gi|110123921|gb|EC817324.1|EC817 +324 gi|110106464|gb|EC799867.1|EC799867 gi|110123920|gb|EC817323.1|EC817 +323
$ cat Lab/do.re.mi.fa.so.la.ti.2 gi|110095377|gb|EC788780.1|EC788780 gi|110123922|gb|EC817325.1|EC817 +325 gi|210105430|gb|EC798833.1|EC798833 6 gi|110123921|gb|EC817324.1|EC817 +324 gi|110106464|gb|EC799867.1|EC799867 gi|110123920|gb|EC817323.1|EC817 +323
And the output:
$ perl 1141288.pl $
$ cat test_results.txt ok 1 - Lab/do.re.mi.fa.so.la.ti.1 not ok 2 - Lab/do.re.mi.fa.so.la.ti.2 1..2
$ cat test_failure.do.re.mi.fa.so.la.ti.2.txt # Failed test 'Lab/do.re.mi.fa.so.la.ti.2' # at 1141288.pl line 40. # +----+--------------------------+--------------------------+ # | Elt|Got |Expected | # +----+--------------------------+--------------------------+ # | 0|[ |[ | # | 1| '110123922 110095377', | '110123922 110095377', | # * 2| '110123921 110105430', | '110123921 210105430', * # | 3| '110123920 110106464' | '110123920 110106464' | # | 4|] |] | # +----+--------------------------+--------------------------+ # Looks like you failed 1 test of 2.
Hope this helps!

The way forward always starts with a minimal test.

In reply to Re: How to do a reciprocal matching statement by 1nickt
in thread How to do a reciprocal matching statement by ajl412860

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.