in reply to Faster and more efficient way to read a file vertically

Hello Anonymous Monk,

Similar question to yours was asked at the Monastery before How do I get the Nth Character of a String?.

Here are sample of codes from the relevant question:

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; use feature 'say'; # use Benchmark qw(:all) ; # WindowsOS use Benchmark::Forking qw( timethese cmpthese ); # UnixOS sub getn_unpack { return unpack "x" . ($_[1]-1) . "a", $_[0]; } sub getn_substr { return substr $_[0], $_[1]-1, 1; } sub getn_split { return +(split //, $_[0])[$_[1]-1]; } my $strNum = "12345678910"; my $string = "ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA"; # say getn_unpack($string, 10); # say getn_substr($string, 10); # say getn_split($string, 10); my $results = timethese(1000000000, { 'unpack' => getn_unpack($string, + 10), 'substr' => getn_substr($string, 10), 'split' => getn_split($string, 10), }, 'none'); cmpthese( $results ); __END__ $ perl test.pl Rate unpack substr split unpack 171232877/s -- -23% -31% substr 223713647/s 31% -- -10% split 248138958/s 45% 11% --

It looks like the more efficient choice would be to use unpack. Something like that could do what you need. Reading one line at a time, extract the data that you want (one character) and finally push it into an array. Sample of code below:

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; sub getn_unpack { return unpack "x" . ($_[1]-1) . "a", $_[0]; } my $file = 'data.txt'; my @array; if (open(my $fh, '<', $file)) { while (<$fh>) { chomp; push @array, getn_unpack($_, 10); } } else { warn "Could not open file '$file' $!\n"; } print Dumper \@array; __END__ $ cat data.txt ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTACCACAACGAGGACTACACCATCGTGGAACA $ perl test.pl $VAR1 = [ 'C', 'A' ];

Update: Thanks to fellow Monk karlgoethebier for observing my mistake I would suggest an alternative solution to your problem. Use split instead of unpack. See sample of code below:

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; sub getn_split { return +(split //, $_[0])[$_[1]-1]; } my $file = 'data.txt'; my @array; if (open(my $fh, '<', $file)) { while (<$fh>) { chomp; push @array, getn_split($_, 10); } } else { warn "Could not open file '$file' $!\n"; } print Dumper \@array; __END__ $ cat data.txt ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTACCACAACGAGGACTACACCATCGTGGAACA $ perl test.pl $VAR1 = [ 'C', 'A' ];

Hope this helps, BR

Seeking for Perl wisdom...on the process of learning...not there...yet!

Replies are listed 'Best First'.
Re^2: Faster and more efficient way to read a file vertically
by karlgoethebier (Abbot) on Nov 05, 2017 at 14:15 UTC
    "...It looks like the more efficient choice would be to use unpack..."

    I'm not so sure. As you wrote:

    $ perl test.pl Rate unpack substr split unpack 171232877/s -- -23% -31% substr 223713647/s 31% -- -10% split 248138958/s 45% 11% --

    Ergo:

    karls-mac-mini:monks karl$ perl -e 'printf ("%.1f\n", 248138958/171232 +877);' 1.4

    As i wrote at Re^6: Question on Regex:

    "...use cmpthese, the results are sorted from slow to fast..."

    Sorry in advance if i did something wrong missed something.

    Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

      Hello karlgoethebier,

      You are absolutely right. I also read the Benchmark/Optional-Exports where is clearly stated:

      cmpthese ( COUNT, CODEHASHREF, [ STYLE ] ) Optionally calls timethese(), then outputs comparison chart. This: cmpthese( -1, { a => "++\$i", b => "\$i *= 2" } ) ; outputs a chart like: Rate b a b 2831802/s -- -61% a 7208959/s 155% --

      This chart is sorted from slowest to fastest, and shows the percent speed difference between each pair of tests. cmpthese can also be passed the data structure that timethese() returns:

      Thanks for correcting me I will also update my answer. Although to be honest I am kind of impressed how unpack is slower in comparison to substr and split.

      Thanks again for your time and effort, BR.

      Seeking for Perl wisdom...on the process of learning...not there...yet!