in reply to array matching

The naive way to solve this problem would be to check each of file2's lines against each of file1's entries. Obviously that would take far too long when both files have many entries.

A better way is to craft a regex that combines all the entries of file2. Regexp::Assemble does that for you in an easy and efficient way. Once you have your regular expression made, just apply your regex to each line of file1.

use Modern::Perl; use Regexp::Assemble; my @searches = qw/tcgtat gctgga/; my $ra = Regexp::Assemble->new; $ra->add( "$_.*" ) for @searches; $ra = $ra->re; while (<DATA>) { chomp; s/$ra//; say if $_; } __DATA__ xxtcgtatccgaggga cgcgcgggggagg jjsjjjjsjjjdtcgtat aaaaaaacccaaan ggtcgtatffaadda gggctggalllslllssdkk
Output:
xx cgcgcgggggagg jjsjjjjsjjjd aaaaaaacccaaan gg gg

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Replies are listed 'Best First'.
Re^2: array matching
by Anonymous Monk on Sep 09, 2011 at 23:35 UTC

    The documentation for Regexp::Assemble says:

    "Note that Perl's own regular expression engine will implement trie optimisations in perl 5.10"

    So a simple alternation may be just as fast now.