DavyK has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I am basically only starting to learn Perl, and I am using BioPerl to try and do something that I think should be rather simple (famous last words!).

Basically I have a list of genes (~5000) and I want to search the upstream sequence of those genes for the presence of a particular transcription factor binding site (TFBS).

To do this I thought I could find the genomic coordinates of the gene, which I managed using EntrezGene and extracting the from and to coordinates from the url. (wiki here) .

Now I need to find a way to actually get the upstream sequence (start_pos - 3500 .. start_pos + 100). I have tried the example given in the BioPerl wiki (wiki here) but it isn't working for me and I am not really sure as to why, or what to try next.

If any of you have any ideas, I would be very grateful. Thanks for helping a total noob out. Cheers, Davy.

Replies are listed 'Best First'.
Re: Bioperl Sequence Retrieval
by ww (Archbishop) on Sep 29, 2010 at 13:22 UTC
    As a "noob" (nothing wrong with that; we all were, once and some still are after years of experience), you're not giving us much to go on.

    Specifically, you're giving us no way to check your code for typos that make it vary from the wiki entry (assuming that the wiki version is correct, which it ain't, always).

    So, here's a boilerplate idea: post a minimal but executable snippet of your code that illustrates your attempt to "get the upstream sequence" (your post seems to indicate that's where the problem is; if that's incorrect, correct me); the error message(s), if any, or narrative explanation of what "isn't working for me" means, exactly; and a small amount of sample data.

Re: Bioperl Sequence Retrieval
by umasuresh (Hermit) on Sep 29, 2010 at 17:35 UTC