twaddlac has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!!

I am trying to write a script that matches a pattern and then removes only 6 of the following or preceding characters. I thought that I had the syntax correct (based on some printouts that I found online) but I am assuming I don't. When I try to run it i get the error message: "Search pattern not terminated".

Here's part of my code:

while ($seq = $seqIn_obj->next_seq){ push(@seq_id,$seq->id); $temp_seq = $seq->seq; while(($id, $tag_seq) = each(%tag_hash)){ ?($temp_seq =~ s/^$tag_seq/*/i){/w,6}; } $new_seq = Bio::Seq->new(-seq => $temp_seq, -display_id => $se +q->id, -desc => $seq->desc(), -alphabet => $seq->alphabet); $fastQ->write_seq($new_seq); push(@sequence,$temp_seq); }
this is the line that is causing the error:
?($temp_seq =~ s/^$tag_seq/*/i){/w,6};
Also, if it's not too much to ask, is there a way to invert the pattern then apply it? Let me know if you need anything else from me! Thank you very much!!

Replies are listed 'Best First'.
Re: Regex - remove characters (pattern not terminated)
by ikegami (Patriarch) on Jun 28, 2010 at 20:13 UTC

    "?" is taken to be the start of a ?...? operator, but there's no other "?" before the end of the file to end the operator, thus the error.

    You seem to think ?( EXPR ){/w,6}; is valid Perl, but nothing about it is even close to Perl.

Re: Regex - remove characters (pattern not terminated)
by almut (Canon) on Jun 28, 2010 at 19:58 UTC

    Just guessing, but maybe something like this?

    my $temp_seq = "fooXXXXXXbar"; my $tag_seq = "foo"; $temp_seq =~ s/^($tag_seq)\w{6}(.*)/$1$2/; print $temp_seq; # "foobar"

    (removes the six characters following the search pattern "foo", which must occur at the beginning of $temp_seq)

    See s/// under Regexp-Quote-Like-Operators.

    Another way to achieve the same would be

    if ($temp_seq =~ /^$tag_seq/) { substr($temp_seq, $+[0], 6) = ''; }

    where $+[0] is the position after the matched pattern.

      Or:  s/^($tag_seq)\w{6}/$1/
      Or (requires 5.10 due to use of \K):  s{ \A $tag_seq \K \w{6} }{}xms
      Or (requires 5.10 due to use of \K):  s{ \A \Q$tag_seq\E \K \w{6} }{}xms
      Or...

      Thanks for the help! However, I would like to do a little bit more than just remove 6 characters following the pattern; I would like to remove the pattern as well as the 6 preceding/following characters.

      The following did remove the 6 following characters but I would like to integrate the pattern into the regex so that it will also be removed.:
      $temp_seq =~ s/^($tag_seq)\w{6}(.*)/$1$2/;
      Any suggestions?
        I would like to remove the pattern as well as the 6 preceding/following characters.

        Actually, that's even easier; see my other reply.

Re: Regex - remove characters (pattern not terminated)
by JavaFan (Canon) on Jun 28, 2010 at 19:55 UTC
    That's indeed a syntax error, and I've no idea what your intention is. Earlier you write matches a pattern and then removes only 6 of the following or preceding characters. which gives some clues about what you want, but is vague enough that if I were to guess, I'd mostly guess wrong.

    Could you clearify what it is you want?

    is there a way to invert the pattern then apply it?
    Maybe, maybe not. What do you mean by "inverting the pattern"?
      Forgive me for not being as clear as I thought I was.
      What I have working now (as seen in the code in my last post) is a pattern that is removed (in this case it's replaced by an asterisk for testing purposes) in $temp_seq and I would like to have the preceding or following 6 characters removed as well.

      For example:
      $temp = 1234567890; $pattern = 123; $temp =~ s/$pattern/*/i; print "temp = $temp";
      The output I would like to see is: temp = 0

      Like wise, if the pattern removed was at the end of the string, I would like to remove the preceding 6 characters of the $temp.
      As for the inversion of the pattern, consider the example mentioned earlier, where the pattern removed was found at the end of the string but it was inverted:
      $temp = 4567890321; $pattern = 123;
      Instead of creating a whole new variable to match the 321, is there a simpler way to reverse the $pattern variable explicitly in a regex?

      please, let me know if I need to re-clarify anything and thanks again!!
        The output I would like to see is: temp = 0

        As whatever matches will be replaced, just include those 6 characters in the search pattern:

        $temp =~ s/$pattern\w{6}//i;

        As for the inversion, see reverse.

        I would probably write that as:
        my $pattern = '...'; $temp =~ s/${pattern}.{6}//s || $temp =~ s/.{6}${pattern}//s;
        Not sure you need the /i, as your example uses numbers.
        Instead of creating a whole new variable to match the 321, is there a simpler way to reverse the $pattern variable explicitly in a regex?
        But what if the pattern is more complicated? And why do you consider creating a "whole new variable" such a big deal? I guess if your pattern is just a simple string without characters special to the regexp engine, you could write
        $temp =~ s/${\scalar reverse $pattern}.{6}//s || $temp =~ s/.{6}${\scalar reverse $pattern}//s;
        but I wouldn't. The following is, IMO, much simpler:
        my $rpattern = reverse $pattern; $temp =~ s/$rpattern.{6}//s || $temp =~ s/.{6}$rpattern//s;
Re: Regex - remove characters (pattern not terminated) perl6
by eric256 (Parson) on Jun 30, 2010 at 13:56 UTC

    Just playing around with Perl6 some

    #!/home/ejh/rakudo/perl6 my @temp = ("1234567", "7654321"); my $pat = "123"; my $rpat = "321"; for @temp -> $test { print "Testing '$test' - "; my $t = $test; if $t ~~ s/$pat// or $t ~~ /$rpat/ { say "($t) matched"; } }

    ___________
    Eric Hodges
Re: Regex - remove characters (pattern not terminated)
by twaddlac (Novice) on Jul 01, 2010 at 19:07 UTC

    Thank you for all of your help! I finally got it working! However, I have another question. I'm not very familiar with lookarounds but I think that they would increase the throughput of my program exponentially!

    I would like to use a lookahead (and/or lookbehind, I don't really know the difference..) assertion after matching a pattern (from a list of patterns) at either the beginning or end of the string. The lookaround would search for an adjacent pattern that is also from the list (NOTE: order does not matter!). If there is no known adjacent pattern, then remove the 6 characters preceding or following the pattern (as we have done already); if there is a pattern, proceed with some other instruction (of which is not relevant). The following is pseudocode to help explain:

    $pattern_A = "123"; $pattern_b = "456"; $string_1 = "1234567890"; $string_2 = "1236547890"; # Then apply regex/lookaround to search for adjacent pattern if($adjacent_pattern == false){ remove_pattern_and_six_following_characters; print("string_*"); } else{ do_something_else; print("string_*"); }

    The output for $string_1 would be:
    string_1 = 1234567890
    as it had an adjacent known pattern, the program performed some other function.

    The output for $string_2 would be:
    string_2 = 0
    as it did not have an adjacent known pattern, the pattern that was matched and the following 6 characters were removed.

    Let me know if you need to know anything else!! Thank you very much!!