Re: Re: Re: Re: BioInformatics - polyA tail search

by MiamiGenome (Sexton)
on Sep 02, 2003 at 18:36 UTC

in reply to Re: Re: Re: BioInformatics - polyA tail search
in thread BioInformatics - polyA tail search

You are correct, but when the automated sequencers can not unambiguously choose the A,C,T,or G, they assign 'N'.

My chosen example sequences were from the beginning of a sequence file. PolyA stretches are found at the end.

Cheers, and thank you in advance!

Re: Re: Re: Re: Re: BioInformatics - polyA tail search
by BrowserUk (Patriarch) on Sep 02, 2003 at 19:14 UTC

    Something like this will get you started.

    perl -nle" print "$ARGV:($./$+[0]): $1" if m[([AN]{10,}]g;" file*

    This will print lines like

    filename:(10/50): ANNNANANAAN

    where the first number is theline in the file and the second is the offset within the line.

    For unix you need to swap "s for 's, and under Win32 you would need to add BEGIN{ @ARGV=map{ glob } @ARGV } to expand the wildcard filespec supplied on the comand line.

    See perlrun for the switches used, and perlre for the regex.

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
    If I understand your problem, I can solve it! Of course, the same can be said for you.

Node Type: note
As of 2023-01-29
