james.v has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I am trying to alter this script:

 perl -lne '/>/ && do {print $c if defined $c; $c = 0; print} || $c++; END {print $c}' input_file > output_file

In order to make the output counts for only unique lines that are between lines starting with '>'

Currently this script works well in generating overall counts of the lines that are between lines that begin with '>'

Example input_file:
>05143_African_trypanosomiasis TRINITY_DN26760_c1_g1 18169 42987 42987 >05145_Toxoplasmosis 43736 38319 38320 38320 TRINITY_DN24151_c3_g1 TRINITY_DN25493_c0_g1
Example of output_file:
>05143_African_trypanosomiasis 4 >05145_Toxoplasmosis 6
Example of desired output:
>05143_African_trypanosomiasis 3 >05145_Toxoplasmosis 5

I'm also very new to perl so an explanation of the tweaked code would be much appreciated.

-James

Replies are listed 'Best First'.
Re: Getting unique line counts between lines starting with '>'
by choroba (Cardinal) on Oct 19, 2017 at 21:34 UTC
    Use a hash of the lines, keys of a hash are always unique.

    perl -lne 'sub out {print $h, "\n", scalar keys %c if %c } />/ and do { out(); %c =(); $h = $_ } or $c{$_}++; END { out() }' < input-file
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

      A quick question. Is there a way to get the keys of the %s hash to be printed in a comma-delimited list?

      For example:

      >05143_African_trypanosomiasis 3 TRINITY_DN26760_c1_g1, 18169, 42987 >05145_Toxoplasmosis 5 43736, 38319, 38320, TRINITY_DN24151_c3_g1, TRINITY_DN25493_c0_g1

      best,

      James

        Update: Corrected for Comma-separated numbers.
        $ perl -lne 'sub prt{@c && print scalar @c,"\n",join ", ",@c;@c=();pri +nt} />/?prt:push @c,$_}{prt' TheFileName
        output
        >05143_African_trypanosomiasis 4 TRINITY_DN26760_c1_g1, 18169, 42987, 42987 >05145_Toxoplasmosis 7 43736, 38319, 38320, 38320, TRINITY_DN24151_c3_g1, TRINITY_DN25493_c0_ +g1
        For info on "}{", see "Eskimo greeting" in perlsecret.

                        All power corrupts, but we need electricity.

      Thank you for the help Choroba, works like a charm!

      best,

      James