Re: How do i use regexes in one file to match FASTA sequences in another file

Basically there is nothing special in reading regexes from a file in contrast to using predefined ones.

It boils down to the question, how to represent the matches and the number of matches in an efficient way.

I created some files with simplified test input to concentrate on the problem: regexes.txt

ID1>>^a
ID2>>h$
ID3>>b
ID4>>[a-z]{9,10}
ID5>>[ah]
[download]

lines.txt

id_A: abcdefg
id_B: bcdefgh
id_C: cdefghijk
[download]

Probably you will have to make changes to the "split"-Statements to match the format of your input.

I am storing the matches in a Hash that uses the regex-expressions as keys and array references of matches as values.

#!/usr/bin/perl -w

use strict;
use autodie;

open(my $regexefile, "<", "regexes.txt");
my @regexes = <$regexefile>;
chomp @regexes;
 
my %regexes = map { split(/>>/, $_) } @regexes;

my %matches;

open(my $inputfile, "<", "lines.txt");
while (<$inputfile>) {
    
    while (my ($id, $regex) = each(%regexes)) 
    {
        my (undef, $line) = split(/ /, $_);
        if ( $line =~ /$regex/) {
            if (! defined($matches{$regex})) {
                $matches{$regex} = [];
            }
            chomp $line;
            push($matches{$regex}, $line);    
        }

    }

}
while (my ($regex, $matches) = each(%matches)) {
    if (!scalar @$matches) {
        next;
    }

    print "$regex: No of matches " . scalar @$matches . "\n";
    foreach my $match (@$matches) {
        print "matched $match\n";
    }
}
[download]

Update: added autodie; warnings are already active because of -w.

Comment on Re: How do i use regexes in one file to match FASTA sequences in another file Select or Download Code

Replies are listed 'Best First'.
Re^2: How do i use regexes in one file to match FASTA sequences in another file by Kenosis (Priest) on Nov 22, 2013 at 19:30 UTC
Nice logic to share with OP. Consider, however, adding `use warnings; use autodie;`, the latter to handle `open` errors.	[reply] [d/l] [select]