Re: Fast way to read from file

It seems to me that you are taking the fastest approach possible, given the assumption that the rows in your file are of varying lengths. If the rows were the same length, then it would be very easy (and quick) to get the point in the file that you want to read.

It seems that you are performing multiple reads from this large file, otherwise the performance wouldn't really be an issue. Is there any way that you can batch up your reads so that you can leverage the scan through the file? For example, reading rows 1, 2, 4 and 16 by grabbing rows 1, 2 and 4 on the way to getting 16 will be faster than reading each of the rows individually.

If you have the memory and are doing enough row access, another approach would be to save the offset of each line in the file the first time through that part of the file so that the next time you try to retrieve a row that you've already seen, you can jump right to that row. I'm thinking something like:

my @offsets;

sub Set_line (\*\$\$) {
  my $fh_r=shift @_;
  my $cur_r=shift @_;
  my $row_r=shift @_;

  if ($offsets[$$row_r]){
    # We know where the row is, so just go there.
    seek($ffh_r, $offsets[$$row_r], 0);
  } elsif ($$cur_r>$$row_r) {
    # We don't know where it is, so we should start at the end of the 
+area that we've indexed
    seek ($$fh_r, $offsets[-1],0);
    $$cur_r= scalar $offsets;
  }
  while (!(($$cur_r)==$$row_r) && $_ ne "") {
    $_=readline($fh_r);
    $$cur_r++;
    $offset[$$cur_r] = tell($$fh_r);
  }
  return;
}
[download]

Or something like that. Kind of on-the-fly indexing. This code isn't tested, but the idea is there. Should give you some performance improvement at the cost of some memory. Or you could tie the offset array to a file to build a persistent index for future use.

Comment on Re: Fast way to read from file Download Code