theward has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I am a newbie at Perl and am trying to teach myself to become better. I would like to ask for some wisdom.

I have some code and know what it's doing but I can't get my head around how it's actually doing it. I have a fasta file (micro.txt) with several microRNAs in the form of: >hsa-let-7a-5pTGAGGTAGTAGGTTGTATAGTT>hsa-let-7a-3pCTATACAA......etc the code is:

my $micro = 'micro.txt'; open(IN, $micro) or die "Can't open file $micro because $!\n"; while(my $line=<IN>) { if ($line=~/>/) { chomp($line); $line=~s/>//; my $sequence=<IN>; chomp($sequence); } close(IN);

I know that the code is separating the 'header' (i.e.>hsa-let-7a-5p) and 'sequence' part (i.e. TGAGGTAGTAGGTTGTATAGTT) but how is this doing it by just using chomp? My understanding was that chomp just removed any whitespace at the end of a line?? Please help!!

Replies are listed 'Best First'.
Re: Not sure how it's working?
by AnomalousMonk (Archbishop) on Feb 13, 2015 at 06:45 UTC
    I have a ... file ... in the form of ...

    Remember that whitespace is but the callow plaything of the HTML/XML renderer of the browser with which you, I and others view your post. Do you see form or structure in the data as posted? It's possible to guess at this structure, but why should we? Please update your original post to use  <c> ... </c> or  <code> ... </code> tags around input (and output, if such there be) data as well as your code. Please see Markup in the Monastery and Writeup Formatting Tips.

    I have some code and know what it's doing ...

    The title of your post implies that the code works, but the posted code does not compile. It's possible by fiddling around with it a bit to make it compile, but then, as noted above, it doesn't really do anything. Again, why should we guess about all this? Please give a small, executable code example that operates on a small, downloadable data sample and that illustrates to us what it is that you know it's doing. (And a verbal description of what it's supposed to do would be helpful, too.) Please see I know what I mean. Why don't you?.


    Give a man a fish:  <%-(-(-(-<

Re: Not sure how it's working?
by jmmitc06 (Beadle) on Feb 13, 2015 at 06:54 UTC
    Please verify that this code is doing what you actually think it is doing. As you have posted it, there's really no way for you tell that you are separating the header and the sequence (I assume you printed something but I'm assuming here). Until you do that, I'm not sure that the posted code does what you believe it is doing. You should post the output of your script for an example FASTA file and what you want the output to be if they differ.
Re: Not sure how it's working?
by Anonymous Monk on Feb 13, 2015 at 04:39 UTC

    Your understanding of chomp is accurate

    However, since this program accomplishes nothing real ... if you can explain what you think every line is doing, then I could help you make sense of it

Re: Not sure how it's working?
by Anonymous Monk on Feb 13, 2015 at 06:43 UTC
    Are you saying that given the line
    >hsa-let-7a-5pTGAGGTAGTAGGTTGTATAGTT
    the Perl code you've shown splits it into two lines?
    hsa-let-7a-5p TGAGGTAGTAGGTTGTATAGTT
    Well you "can't get your head around how it's actually doing it" because it doesn't. But shouldn't fasta files have headers on separate lines? That is, aren't header and body different lines to begin with?
    >hsa-let-7a-5p TGAGGTAGTAGGTTGTATAGTT