Re^2: Faster and more efficient way to read a file vertically

So basically I have this (brute-force attack):

    while(<>)
    {
        if($_=~/^(.*?)\t(.*)/)
        {
            $read_seq=$1;
            $read_id=$2;

            @split_read=split(//, $read_seq);
            $respective_read_letter=$split_read[$i];

            if($respective_read_letter eq 'A')
                {$count_A++;}
            elsif($respective_read_letter eq 'T')
                {$count_T++;}
            elsif($respective_read_letter eq 'C')
                {$count_C++;}
            elsif($respective_read_letter eq 'G')
                {$count_G++;}
            elsif($respective_read_letter eq '.')
                {$count_dot++;}
            else
                {print "ERROR in read: $read\t$respective_read_letter\
+n";}
        }
    }

    $total=$count_A+$count_T+$count_C+$count_G+$count_dot;

    $fraction_A = sprintf("%.2f", 100*($count_A/$total));
    $fraction_T = sprintf("%.2f", 100*($count_T/$total));
    $fraction_C = sprintf("%.2f", 100*($count_C/$total));
    $fraction_G = sprintf("%.2f", 100*($count_G/$total));
    $fraction_dot = sprintf("%.2f", 100*($count_dot/$total));
    print $actual_pos,"\t",$expected_letter,"\t",$fraction_A,"\t",$fra
+ction_T,"\t",$fraction_G,"\t",$fraction_C,"\t",$fraction_dot,"\n";
[download]

Comment on Re^2: Faster and more efficient way to read a file vertically Download Code

Replies are listed 'Best First'.
Re^3: Faster and more efficient way to read a file vertically by pryrt (Abbot) on Nov 03, 2017 at 16:15 UTC
If you're really only going to be doing one column, but want it to be chosen by the variable `$i`, I'd suggest substr: `$respective_read_letter = substr $read_seq, $i, 1;`. If finding an optimum solution is important to you (ie, if you'll use this script many times for the forseeable future, rather than just once or twice where "fast engouh" is fast enough), then I'd recommend Benchmarking the substr vs unpack vs LanX's regex (and any others that are suggested). But whatever you do, make sure to use ++LanX's hash `%count`. use warnings; use strict; use Benchmark qw/cmpthese/; use Test::More tests => 1; my @dataset = (); push @dataset, join('', map { (qw/A C G T/)[rand 4] } 1 .. 30 ) for 1 +.. 1000; my $i = $ARGV[0] // 10; sub test { my $fnref = shift; my $count; for my $read_seq( @dataset ) { my $letter = $fnref->($read_seq, $i); $count->{$letter}++; } return $count; } sub rfn { test( sub { my $skip = $_[1]; $_[0] =~ /.{$skip}(.)/; return $1; }); }; sub sfn { test( sub { substr $_[0], $_[1], 1; }); }; sub ufn { test( sub { ... # I'm no unpack expert }); }; cmpthese(0, { regex => \&rfn, substr => \&sfn, #unpack => \&ufn, }); is_deeply rfn(), sfn(), 'same results'; [download]	[reply] [d/l] [select]
Re^3: Faster and more efficient way to read a file vertically by LanX (Saint) on Nov 03, 2017 at 15:26 UTC
`$i` is variable in your example. Reading vertically doesn't make sense then. I'd suggest `$count{$letter}++` with a hash `%count` to speed things up. Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply] [d/l] [select]


Don't ask to ask, just ask
	PerlMonks