array matching

polsum has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: array matching by ww (Archbishop) on Sep 09, 2011 at 18:13 UTC
C'mon! Firstly, what you posted isn't HTML Code. But far more significant, you merely assert that you tried to write the Perl to satisfy your spec... but don't show any code to support that claim. And then you start discussing array matching ... but, while we can infer arrays, you haven't shown any. In short, your post falls short of the mark for lack of precision; the missing demonstration of effort; and the absence of code & associated verbatim errors messages, if any. So, I'd suggest you take one step back and read On asking for help and How do I post a question effectively? (with special attention to the fact that this is not a factory churning out free code but rather, a venue to share wisdom and help newcomers to master Perl) and the regex section of Tutorials here. You almost certainly also have `perldoc perlretut` as a resource, right on your own computer. You'll also find that what I think is your question is answered repeatedly, here. Try hunting around the Q&A section (also available from the links just below the Monastery's bannder) and maybe follow up with a Google or Super Search for a triplet of terms like `Perl array matching`. Then, come back with a fresh effort and any remaining questions (including the missing material above) and you'll likely get cheerful and expert advice.	[reply] [d/l] [select]
Re: array matching by CountZero (Bishop) on Sep 09, 2011 at 21:32 UTC
The naive way to solve this problem would be to check each of file2's lines against each of file1's entries. Obviously that would take far too long when both files have many entries. A better way is to craft a regex that combines all the entries of file2. Regexp::Assemble does that for you in an easy and efficient way. Once you have your regular expression made, just apply your regex to each line of file1. `use Modern::Perl; use Regexp::Assemble; my @searches = qw/tcgtat gctgga/; my $ra = Regexp::Assemble->new; $ra->add( "$_.*" ) for @searches; $ra = $ra->re; while (<DATA>) { chomp; s/$ra//; say if $_; } __DATA__ xxtcgtatccgaggga cgcgcgggggagg jjsjjjjsjjjdtcgtat aaaaaaacccaaan ggtcgtatffaadda gggctggalllslllssdkk` [download] Output: `xx cgcgcgggggagg jjsjjjjsjjjd aaaaaaacccaaan gg gg` [download] CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply] [d/l] [select]
Re^2: array matching by Anonymous Monk on Sep 09, 2011 at 23:35 UTC
The documentation for Regexp::Assemble says: "Note that Perl's own regular expression engine will implement trie optimisations in perl 5.10" So a simple alternation may be just as fast now.	[reply]
Re: array matching by Kc12349 (Monk) on Sep 09, 2011 at 19:09 UTC
Here is some half hearted sample code. In this case I assume you have read your lines for each file into `@file1_lines` and `@file2_lines`. I also assume you've run `chomp` on the arrays. This is a reasonable approach if you know your files are a manageable size and you don't mind loading them entirely into memory. In practice I would recommend a while loop to read any file of unknown size whenever possible. My guess, given your expected output, is that when you see any of the strings in file2 in file1, you would like to delete that string and the rest of the line in file1. I've tried to be extra verbose here in showing this replace, and pushing the results to `@output_array`. `$replace_pattern` is a pattern built from the strings in file2 to match any of those strings to the end of the line. The sample code produces your desired output sample. `my @file1_lines = qw( xxtcgtatccgaggga cgcgcgggggagg jjsjjjjsjjjdtcgtat aaaaaaacccaaan ggtcgtatffaadda gggctggalllslllssdkk ); my @file2_lines = qw( tcgtat gctgga ); my $replace_pattern = join('.\|', @file2_lines) . '.'; my @output_array; for my $line (@file1_lines) { my $output_line = $line; $output_line =~ s/$replace_pattern//; push @output_array, $output_line; } say for @output_array;` [download]	[reply] [d/l] [select]