Re: array matching

The naive way to solve this problem would be to check each of file2's lines against each of file1's entries. Obviously that would take far too long when both files have many entries.

A better way is to craft a regex that combines all the entries of file2. Regexp::Assemble does that for you in an easy and efficient way. Once you have your regular expression made, just apply your regex to each line of file1.

use Modern::Perl;
use Regexp::Assemble;


my @searches = qw/tcgtat gctgga/;

my $ra = Regexp::Assemble->new;
$ra->add( "$_.*" ) for @searches;
$ra = $ra->re;

while (<DATA>) {
    chomp;
    s/$ra//;
    say if $_;
}

__DATA__
xxtcgtatccgaggga
cgcgcgggggagg
jjsjjjjsjjjdtcgtat
aaaaaaacccaaan
ggtcgtatffaadda
gggctggalllslllssdkk
[download]

Output:

xx
cgcgcgggggagg
jjsjjjjsjjjd
aaaaaaacccaaan
gg
gg
[download]

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Comment on Re: array matching Select or Download Code

Replies are listed 'Best First'.
Re^2: array matching by Anonymous Monk on Sep 09, 2011 at 23:35 UTC
The documentation for Regexp::Assemble says: "Note that Perl's own regular expression engine will implement trie optimisations in perl 5.10" So a simple alternation may be just as fast now.	[reply]