Re: Faster and more efficient way to read a file vertically

Similar question to yours was asked at the Monastery before How do I get the Nth Character of a String?.

Here are sample of codes from the relevant question:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use feature 'say';
# use Benchmark qw(:all) ; # WindowsOS
use Benchmark::Forking qw( timethese cmpthese ); # UnixOS

sub getn_unpack {
    return unpack "x" . ($_[1]-1) . "a", $_[0];
}

sub getn_substr {
    return substr $_[0], $_[1]-1, 1;
}

sub getn_split {
    return +(split //, $_[0])[$_[1]-1];
}

my $strNum = "12345678910";
my $string = "ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA";

# say getn_unpack($string, 10);
# say getn_substr($string, 10);
# say getn_split($string, 10);

my $results = timethese(1000000000, { 'unpack' => getn_unpack($string,
+ 10),
                      'substr' => getn_substr($string, 10),
                      'split' => getn_split($string, 10),
            }, 'none');
cmpthese( $results );

__END__

$ perl test.pl
              Rate unpack substr  split
unpack 171232877/s     --   -23%   -31%
substr 223713647/s    31%     --   -10%
split  248138958/s    45%    11%     --
[download]

It looks like the more efficient choice would be to use unpack. Something like that could do what you need. Reading one line at a time, extract the data that you want (one character) and finally push it into an array. Sample of code below:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

sub getn_unpack {
    return unpack "x" . ($_[1]-1) . "a", $_[0];
}

my $file = 'data.txt';
my @array;

if (open(my $fh, '<', $file)) {
  while (<$fh>) {
      chomp;
      push @array, getn_unpack($_, 10);
  }
} else {
  warn "Could not open file '$file' $!\n";
}

print Dumper \@array;

__END__

$ cat data.txt
ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA
ACATCACCTACCACAACGAGGACTACACCATCGTGGAACA

$ perl test.pl
$VAR1 = [
          'C',
          'A'
        ];
[download]

Update: Thanks to fellow Monk karlgoethebier for observing my mistake I would suggest an alternative solution to your problem. Use split instead of unpack. See sample of code below:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

sub getn_split {
    return +(split //, $_[0])[$_[1]-1];
}

my $file = 'data.txt';
my @array;

if (open(my $fh, '<', $file)) {
  while (<$fh>) {
      chomp;
      push @array, getn_split($_, 10);
  }
} else {
  warn "Could not open file '$file' $!\n";
}

print Dumper \@array;

__END__

$ cat data.txt
ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA
ACATCACCTACCACAACGAGGACTACACCATCGTGGAACA

$ perl test.pl
$VAR1 = [
          'C',
          'A'
        ];
[download]

Hope this helps, BR

Seeking for Perl wisdom...on the process of learning...not there...yet!

Comment on Re: Faster and more efficient way to read a file vertically Select or Download Code

Replies are listed 'Best First'.
Re^2: Faster and more efficient way to read a file vertically by karlgoethebier (Abbot) on Nov 05, 2017 at 14:15 UTC
"...It looks like the more efficient choice would be to use unpack..." I'm not so sure. As you wrote: `$ perl test.pl Rate unpack substr split unpack 171232877/s -- -23% -31% substr 223713647/s 31% -- -10% split 248138958/s 45% 11% --` [download] Ergo: `karls-mac-mini:monks karl$ perl -e 'printf ("%.1f\n", 248138958/171232 +877);' 1.4` [download] As i wrote at Re^6: Question on Regex: "...use cmpthese, the results are sorted from slow to fast..." Sorry in advance if i ~~did something wrong~~ missed something. Best regards, Karl ŤThe Crux of the Biscuit is the Apostropheť `perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'`Help	[reply] [d/l] [select]
Re^3: Faster and more efficient way to read a file vertically by thanos1983 (Parson) on Nov 06, 2017 at 09:43 UTC
Hello karlgoethebier, You are absolutely right. I also read the Benchmark/Optional-Exports where is clearly stated: `cmpthese ( COUNT, CODEHASHREF, [ STYLE ] ) Optionally calls timethese(), then outputs comparison chart. This: cmpthese( -1, { a => "++\$i", b => "\$i = 2" } ) ; outputs a chart like: Rate b a b 2831802/s -- -61% a 7208959/s 155% --` [download] This chart is sorted from slowest to fastest*, and shows the percent speed difference between each pair of tests. cmpthese can also be passed the data structure that timethese() returns: Thanks for correcting me I will also update my answer. Although to be honest I am kind of impressed how unpack is slower in comparison to substr and split. Thanks again for your time and effort, BR. Seeking for Perl wisdom...on the process of learning...not there...yet!	[reply] [d/l] [select]