Re: Faster and more efficient way to read a file vertically -- updated

Hello,

million of lines still probably fit in memory.. Note that $#{$aoa[0]} assumes all lines are of the same length as you said.

use strict;
use warnings;

my @aoa;
while (<DATA>) {
    chomp;
    push @aoa,[split '',$_];
    
}

foreach my $col(0..$#{$aoa[0]}){
  print "Column $col: ",
        (join ' ',map { $aoa[$_][$col] } 0..$#aoa),
        "\n";

}

__DATA__
ACATCACCTC
ACATCACCTC
ACATCACCTC
ACATCACCTC

# out 

Column 0: A A A A
Column 1: C C C C
Column 2: A A A A
Column 3: T T T T
Column 4: C C C C
Column 5: A A A A
Column 6: C C C C
Column 7: C C C C
Column 8: T T T T
Column 9: C C C C
[download]

UPDATE if really care memory you can try the following (*untested*)approach:


# pseudocode!!

# analize first line
my $line = <$fh>;
chomp $line;
# compute last index of the future array (or future string? be aware o
+f possible off one errors!!);
my last = length $line - 1;
# rewind the filehandle
seek $fh,0,0;

sub get_column{
  my $col = shift;
  my $line = shift;
  if($col==0){$line=~/^(.)/}
  elsif($col==$last){$line=~/(.)$/}
  else{ $line=~/.{$col-1}(.)/} # or $last - $col? 
  return $1;
} 
while (<$fh>){
   chomp;
   print get_column(3,$_)

}
[download]

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Comment on Re: Faster and more efficient way to read a file vertically -- updated Select or Download Code

Replies are listed 'Best First'.
Re^2: Faster and more efficient way to read a file vertically -- updated by Laurent_R (Canon) on Nov 03, 2017 at 17:59 UTC
million of lines still probably fit in memory. Maybe. Or maybe not. But why take the chance? Especially with an AoA which has some extra cost. It is so easy to do everything in the first loop, when reading each line. And BTW, it is also probably faster, because using an array of arrays implies copying the data once more.	[reply]
Re^3: Faster and more efficient way to read a file vertically -- updated by Discipulus (Canon) on Nov 03, 2017 at 18:05 UTC
Yes Laurent_R you are absolutely rigth and probably i gave a dumb answer. I not even looked other's replies carefully before posting: as only excuse i can say i was filling the bathtub.. ;=) If data must be accessed more times probably is worth to put into an sqlite db, a char per column and access it via SQL queries. No big memory overhead and super speed. L* There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l]