in reply to How do i use regexes in one file to match FASTA sequences in another file
It boils down to the question, how to represent the matches and the number of matches in an efficient way.
I created some files with simplified test input to concentrate on the problem: regexes.txt
lines.txtID1>>^a ID2>>h$ ID3>>b ID4>>[a-z]{9,10} ID5>>[ah]
Probably you will have to make changes to the "split"-Statements to match the format of your input.id_A: abcdefg id_B: bcdefgh id_C: cdefghijk
I am storing the matches in a Hash that uses the regex-expressions as keys and array references of matches as values.
Update: added autodie; warnings are already active because of -w.#!/usr/bin/perl -w use strict; use autodie; open(my $regexefile, "<", "regexes.txt"); my @regexes = <$regexefile>; chomp @regexes; my %regexes = map { split(/>>/, $_) } @regexes; my %matches; open(my $inputfile, "<", "lines.txt"); while (<$inputfile>) { while (my ($id, $regex) = each(%regexes)) { my (undef, $line) = split(/ /, $_); if ( $line =~ /$regex/) { if (! defined($matches{$regex})) { $matches{$regex} = []; } chomp $line; push($matches{$regex}, $line); } } } while (my ($regex, $matches) = each(%matches)) { if (!scalar @$matches) { next; } print "$regex: No of matches " . scalar @$matches . "\n"; foreach my $match (@$matches) { print "matched $match\n"; } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: How do i use regexes in one file to match FASTA sequences in another file
by Kenosis (Priest) on Nov 22, 2013 at 19:30 UTC |