in reply to Re^2: Counting matches
in thread Counting matches

Now that you've provided an indication of your data, a much better solution (than my earlier tentative suggestion) presents itself.

Assuming you have a filehandle, e.g. $matches_fh, to your file of match data (Genomes_used_Hant.txt in your example); and another, e.g. $fasta_fh, to your fasta data (NRT2.txt in your example); you can capture the wanted counts like this:

my $alt = join '|', reverse sort <$matches_fh>; my $re = qr{(?x: ^ > ( $alt ) )}; my %count; /$re/ && ++$count{$1} while <$fasta_fh>;

The code you've presented, in a couple of your posts in this thread, use the 3-argument form of open with lexical filehandles: this is very good. You are not, however, checking for I/O errors: this is not good at all. The easiest method is to let Perl do this checking for you with the autodie pragma; the alternative is to do this yourself, as shown in the open documentation.

In the test code below, I've used Inline::Files purely for convenience. The count information is in %count: you can format and output this however you want.

#!/usr/bin/env perl use strict; use warnings; use Data::Dump; use Inline::Files; my $alt = join '|', reverse sort <MATCHES>; my $re = qr{(?x: ^ > ( $alt ) )}; my %count; /$re/ && ++$count{$1} while <FASTA>; dd \%count; __MATCHES__ Gloin1 XYZ1 XYZ XYZ12 __FASTA__ >Gloin1_1 unwanted data >XYZ_1 unwanted data >XYZ12_1 unwanted data >XYZ1_2 unwanted data >XYZ1_1 unwanted data >XYZ12_3 unwanted data >Gloin1_2 unwanted data >XYZ12_2 unwanted data

Output:

{ Gloin1 => 2, XYZ => 1, XYZ1 => 2, XYZ12 => 3 }

— Ken