my $num_chars = 0; <>; while ( <> ) { do {print "$num_chars\n"; $num_chars = 0; next} if /^>/; $num_chars += tr/ACGT/ACGT/; } print "$num_chars\n";
or
my @num_chars = (); my $i = -1; while ( <> ) { $i++, next if /^>/; $num_chars[$i] += tr/ACGT/ACGT/; } foreach ( @num_chars ) { print "$_\n"; }
both seem to work for me. That is, for each group of lines separated by lines startting with '>', they each accurately print the number of [ACGT] which appear in that group.

The first gets by with one declared scalar and one loop. The second requires more memory since it uses an array instead of a single scalar. That array is useful, though, if you want to do anything with the values other than just print them. The second also uses two loops instead of one, which could be a speed factor on large data sets. The second loop could be removed easily enough.

On the loop fixups:
Since someone (not here on PerlMonks) recently asked me why I tend to use loop fixups instead of additional tests within a loop... The first has a pre-loop fixup involving input and a post-loop fixup involving output. The second initializes the array index to -1 instead of to zero before the loop. One might wonder why I used these fixups since they may not be quite as clear to some as an additional test inside the loop. The loops are cluttered enough as it is, and sometimes visual clarity makes up for logical clarity. I find them simpler to debug when compared to something which makes an additional flow control changes, especially inside a loop. This way the loops stay more consistent and focus on the general case in which we're interested. I find this better than having the special-case code folded into the general-case code when I try to analyze a program's behavior. Also, by making these simple and easily understood actions happen outside the loop, you could see notable performance differences on large data sets. If the printing and resetting of the counter from the first example get more complex, they could be put into a single subroutine, and that subroutine could be called both from within the loop and as the last output after the loop, making the final output a single entity to debug along with the others. So it's not just optimization. I actually find it easier to do things this way. YMMV.



Christopher E. Stith

In reply to Re: parsing question by mr_mischief
in thread parsing question by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.