Re^3: Counting matches

Now that you've provided an indication of your data, a much better solution (than my earlier tentative suggestion) presents itself.

Assuming you have a filehandle, e.g. $matches_fh, to your file of match data (Genomes_used_Hant.txt in your example); and another, e.g. $fasta_fh, to your fasta data (NRT2.txt in your example); you can capture the wanted counts like this:

my $alt = join '|', reverse sort <$matches_fh>;
my $re = qr{(?x: ^ > ( $alt ) )};
my %count;
/$re/ && ++$count{$1} while <$fasta_fh>;
[download]

The code you've presented, in a couple of your posts in this thread, use the 3-argument form of open with lexical filehandles: this is very good. You are not, however, checking for I/O errors: this is not good at all. The easiest method is to let Perl do this checking for you with the autodie pragma; the alternative is to do this yourself, as shown in the open documentation.

In the test code below, I've used Inline::Files purely for convenience. The count information is in %count: you can format and output this however you want.

#!/usr/bin/env perl

use strict;
use warnings;

use Data::Dump;
use Inline::Files;

my $alt = join '|', reverse sort <MATCHES>;
my $re = qr{(?x: ^ > ( $alt ) )};
my %count;
/$re/ && ++$count{$1} while <FASTA>;

dd \%count;

__MATCHES__
Gloin1
XYZ1
XYZ
XYZ12
__FASTA__
>Gloin1_1
unwanted data
>XYZ_1
unwanted data
>XYZ12_1
unwanted data
>XYZ1_2
unwanted data
>XYZ1_1
unwanted data
>XYZ12_3
unwanted data
>Gloin1_2
unwanted data
>XYZ12_2
unwanted data
[download]

Output:

{ Gloin1 => 2, XYZ => 1, XYZ1 => 2, XYZ12 => 3 }
[download]

— Ken

Comment on Re^3: Counting matches Select or Download Code