blarsen has asked for the wisdom of the Perl Monks concerning the following question:

Below is a simple script where I am trying to find the locations for a given series within my DNA strand.
When I attempt to run it I get:

Global symbol "$dna1" requires explicit package name at restrictionSites.pl line 21.
syntax error at restrictionSites.pl line 21, near "){"
Global symbol "$dna1" requires explicit package name at restrictionSites.pl line 22.
I'm not entirely sure what/how to change. Any suggestions would be appreciate and it's probably very simple and am over thinking it. Thanks in advance.

#!/usr/bin/perl
use strict;
use warnings;
my $dna1 = "AACAGCACGGCAACGCTGTGCCTTGGGCACCATGCAGTACCAAACGGAACGATAGTGAAAACAATCAC AATCGCGGGTTTCATAGAAAATGGTTGGGAAGGAATGGTGGATGGTTGGTACGGTTT"
#deleted most so it's easier to read

while ($dna1 =~ s/\s/GA{ACT}TC/g){
print "Found GA{ACT}TC at", pos($dna1)-4, "\n";
}

Replies are listed 'Best First'.
Re: RegEx and Packaging Name Problem
by BrowserUk (Patriarch) on Oct 22, 2010 at 00:54 UTC

    You are missing a semicolon ';'.

    my $dna1 = "AACAGCACGGCAACGCTGTGCCTTGGGCACCATGCAGTACCAAACGGAACGATAGTGA +AAACAATCAC AATCGCGGGTTTCATAGAAAATGGTTGGGAAGGAATGGTGGATGGTTGGTACGGTTT" + <<<<< MISSING ';'!! ...
Re: RegEx and Packaging Name Problem
by Dru (Hermit) on Oct 22, 2010 at 01:05 UTC
    Also, your regular expression is wrong and it does not need to be in a while loop if it's just one long string.
    print "Match found\n" if $dna1 =~ /\sCA[ACT]TC/g;

    Thanks,
    Dru

    Perl, the Leatherman of Programming languages. - qazwart
Re: RegEx and Packaging Name Problem
by jwkrahn (Abbot) on Oct 22, 2010 at 01:11 UTC
    while ($dna1 =~ s/\s/GA{ACT}TC/g){ print "Found GA{ACT}TC at", pos($dna1)-4, "\n"; }

    If you've read pos then you will notice it says 'Returns the offset of where the last "m//g" search left off' but you are not using m//g.

    Your message says "Found GA{ACT}TC" but your search pattern is /\s/ which is a single whitespace character.

    By "pos($dna1)-4" I assume that you want the start of the pattern but a single whitespace character is only one character long and the string "GA{ACT}TC" is 9 characters long?

      Thanks for the help everyone. And the string I'm looking for is only 5 characters long, it's really GAACTTC but I just wrote it differently when I brought it over - sorry about the confusion. So, only doing -4 to get the start of the pattern as a non-programmer would count.
Re: RegEx and Packaging Name Problem
by snape (Pilgrim) on Oct 22, 2010 at 02:48 UTC

    You need to use semicolon after the end of your statement in your program. It should be

    my $dna1 ="AACAGCACGGCAACGCTGTGCCTTGGGCACCATGCAGTACCAAACGGAACGATAGTGAA +AACAATCACAATCGCGGGTTTCATAGAAAATGGTTGGGAAGGAATGGTGGATGGTTGGTACGGTTT;

    Also, your regular expression doesn't makes sense to me. I think you need to use if loop rather than while for looking is you are looking for a space in your string. Also, I really don't understand the use of {} in you regex as you are substituting it with the space. Also there is no command called pos.

Re: RegEx and Packaging Name Problem
by umasuresh (Hermit) on Oct 23, 2010 at 00:23 UTC
    A good tool to have for debugging is ptkdb.
    You will need to install Tk prior to that.
    Both modules can be easily installed by invoking  ppm install module_name at the DOS prompt. I found this to be the most useful debugging tool for Perl.
    You may also want to look at Begin Perl for Bioinformatics book by James Tisdall.