polsum has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have two files like this HTML Code:
file 1: xxtcgtatccgaggga cgcgcgggggagg jjsjjjjsjjjdtcgtat aaaaaaacccaaan ggtcgtatffaadda gggctggalllslllssdkk
file 2: tcgtat gctgga
I want to 1) match each element of file2 to each element of file1. 2) delete all the matched alphabets and subsequent letters in the element of file1. so the output should be
xx cgcgcgggggagg jjsjjjjsjjjd aaaaaaacccaaan gg gg
I tried with regular expressions but couldnt figure out how to match elements of one array with another array. If possible please provide the code in perl because I just started learning perl. thanks in advance

Replies are listed 'Best First'.
Re: array matching
by ww (Archbishop) on Sep 09, 2011 at 18:13 UTC
    C'mon! Firstly, what you posted isn't HTML Code.

    But far more significant, you merely assert that you tried to write the Perl to satisfy your spec... but don't show any code to support that claim. And then you start discussing array matching ... but, while we can infer arrays, you haven't shown any.

    In short, your post falls short of the mark for lack of precision; the missing demonstration of effort; and the absence of code & associated verbatim errors messages, if any.

    So, I'd suggest you take one step back and read On asking for help and How do I post a question effectively? (with special attention to the fact that this is not a factory churning out free code but rather, a venue to share wisdom and help newcomers to master Perl) and the regex section of Tutorials here. You almost certainly also have perldoc perlretut as a resource, right on your own computer. You'll also find that what I think is your question is answered repeatedly, here. Try hunting around the Q&A section (also available from the links just below the Monastery's bannder) and maybe follow up with a Google or Super Search for a triplet of terms like Perl array matching.

    Then, come back with a fresh effort and any remaining questions (including the missing material above) and you'll likely get cheerful and expert advice.

Re: array matching
by CountZero (Bishop) on Sep 09, 2011 at 21:32 UTC
    The naive way to solve this problem would be to check each of file2's lines against each of file1's entries. Obviously that would take far too long when both files have many entries.

    A better way is to craft a regex that combines all the entries of file2. Regexp::Assemble does that for you in an easy and efficient way. Once you have your regular expression made, just apply your regex to each line of file1.

    use Modern::Perl; use Regexp::Assemble; my @searches = qw/tcgtat gctgga/; my $ra = Regexp::Assemble->new; $ra->add( "$_.*" ) for @searches; $ra = $ra->re; while (<DATA>) { chomp; s/$ra//; say if $_; } __DATA__ xxtcgtatccgaggga cgcgcgggggagg jjsjjjjsjjjdtcgtat aaaaaaacccaaan ggtcgtatffaadda gggctggalllslllssdkk
    Output:
    xx cgcgcgggggagg jjsjjjjsjjjd aaaaaaacccaaan gg gg

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      The documentation for Regexp::Assemble says:

      "Note that Perl's own regular expression engine will implement trie optimisations in perl 5.10"

      So a simple alternation may be just as fast now.

Re: array matching
by Kc12349 (Monk) on Sep 09, 2011 at 19:09 UTC

    Here is some half hearted sample code. In this case I assume you have read your lines for each file into @file1_lines and @file2_lines. I also assume you've run chomp on the arrays. This is a reasonable approach if you know your files are a manageable size and you don't mind loading them entirely into memory. In practice I would recommend a while loop to read any file of unknown size whenever possible.

    My guess, given your expected output, is that when you see any of the strings in file2 in file1, you would like to delete that string and the rest of the line in file1. I've tried to be extra verbose here in showing this replace, and pushing the results to @output_array.

    $replace_pattern is a pattern built from the strings in file2 to match any of those strings to the end of the line. The sample code produces your desired output sample.

    my @file1_lines = qw( xxtcgtatccgaggga cgcgcgggggagg jjsjjjjsjjjdtcgtat aaaaaaacccaaan ggtcgtatffaadda gggctggalllslllssdkk ); my @file2_lines = qw( tcgtat gctgga ); my $replace_pattern = join('.*|', @file2_lines) . '.*'; my @output_array; for my $line (@file1_lines) { my $output_line = $line; $output_line =~ s/$replace_pattern//; push @output_array, $output_line; } say for @output_array;