comment on

OK, thanks for posting some data. In future, please post your data inside <code></code> tags, like code.

The following solution uses some lightweight modules to do a lot of the work. Since you are testing scientific results I thought it would be appropriate to use part of Perl's testing framework. Test::Differences compares two data structures to see if they are identical and reports where they differ, if they are not identical.

This script writes test results (tests are named for the file they are testing) to a composite log file, and also writes test failure diagnostics (a diff of the two files) to an individual log for each file. (The only thing I don't like is that you get left with zero-byte failure logs if there were no failures).

It assumes you want to strip the filenames as shown; change the regexp to suit. It also makes up a directory for the reciprocal files called 'Recip' and for the original blast files called 'Lab' -- change to suit.

#!/usr/bin/perl
use strict;
use warnings;

use File::Find::Rule;
use Path::Tiny qw/ path /;
use Test::More;
use Test::Differences;

# log of all tests
Test::More->builder->output( 'test_results.txt' );

# Get all the files we want to compare
my $rule =  File::Find::Rule->new;
$rule->file->name('*.Recip.blast.top');
my @files = $rule->in( 'Recip' );

foreach my $rcp_file ( @files ) {

  # make a new path for the original (lab results) file and
  # strip the unwanted string from the end of the filename
  ( my $org_file = $rcp_file ) =~ s/^Recip/Lab/;
  $org_file =~ s/.Recip.blast.top//;

  # designate an individual test failure log
  ( my $err_log = "test_failure.$org_file.txt" ) =~ s/Lab\///;
  Test::More->builder->failure_output( $err_log );

  # Get the content of the two files, extract the wanted strings
  # to be compared, and store in arrays
  my @rcp_lines = path( $rcp_file )->lines({ chomp => 1 });
     @rcp_lines = map { join(' ', (split '\|')[1,5]) } @rcp_lines;
  
  my @org_lines = path( $org_file )->lines({ chomp => 1 });
     @org_lines = map { join(' ', (split '\|')[5,1]) } @org_lines;

  # run the tests
  eq_or_diff( \@rcp_lines, \@org_lines, $org_file); 

}
  
done_testing;

__END__
[download]

My data files for testing:

$ cat Recip/do.re.mi.fa.so.la.ti.1.Recip.blast.top 
gi|110123922|gb|EC817325.1|EC817325 gi|110095377|gb|EC788780.1|EC78878
+0
gi|110123921|gb|EC817324.1|EC817324 gi|110105430|gb|EC798833.1|EC79883
+3 6
gi|110123920|gb|EC817323.1|EC817323 gi|110106464|gb|EC799867.1|EC79986
+7
[download]

$ cat Recip/do.re.mi.fa.so.la.ti.2.Recip.blast.top 
gi|110123922|gb|EC817325.1|EC817325 gi|110095377|gb|EC788780.1|EC78878
+0
gi|110123921|gb|EC817324.1|EC817324 gi|110105430|gb|EC798833.1|EC79883
+3 6
gi|110123920|gb|EC817323.1|EC817323 gi|110106464|gb|EC799867.1|EC79986
+7
[download]

$ cat Lab/do.re.mi.fa.so.la.ti.1 
gi|110095377|gb|EC788780.1|EC788780   gi|110123922|gb|EC817325.1|EC817
+325
gi|110105430|gb|EC798833.1|EC798833 6 gi|110123921|gb|EC817324.1|EC817
+324
gi|110106464|gb|EC799867.1|EC799867   gi|110123920|gb|EC817323.1|EC817
+323
[download]

$ cat Lab/do.re.mi.fa.so.la.ti.2
gi|110095377|gb|EC788780.1|EC788780   gi|110123922|gb|EC817325.1|EC817
+325
gi|210105430|gb|EC798833.1|EC798833 6 gi|110123921|gb|EC817324.1|EC817
+324
gi|110106464|gb|EC799867.1|EC799867   gi|110123920|gb|EC817323.1|EC817
+323
[download]

And the output:

$ perl 1141288.pl
$
[download]

$ cat test_results.txt 
ok 1 - Lab/do.re.mi.fa.so.la.ti.1
not ok 2 - Lab/do.re.mi.fa.so.la.ti.2
1..2
[download]

$ cat test_failure.do.re.mi.fa.so.la.ti.2.txt 
#   Failed test 'Lab/do.re.mi.fa.so.la.ti.2'
#   at 1141288.pl line 40.
# +----+--------------------------+--------------------------+
# | Elt|Got                       |Expected                  |
# +----+--------------------------+--------------------------+
# |   0|[                         |[                         |
# |   1|  '110123922 110095377',  |  '110123922 110095377',  |
# *   2|  '110123921 110105430',  |  '110123921 210105430',  *
# |   3|  '110123920 110106464'   |  '110123920 110106464'   |
# |   4|]                         |]                         |
# +----+--------------------------+--------------------------+
# Looks like you failed 1 test of 2.
[download]

Hope this helps!

The way forward always starts with a minimal test.

In reply to Re: How to do a reciprocal matching statement by 1nickt
in thread How to do a reciprocal matching statement by ajl412860

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.