in reply to Fast way to read from file
It seems that you are performing multiple reads from this large file, otherwise the performance wouldn't really be an issue. Is there any way that you can batch up your reads so that you can leverage the scan through the file? For example, reading rows 1, 2, 4 and 16 by grabbing rows 1, 2 and 4 on the way to getting 16 will be faster than reading each of the rows individually.
If you have the memory and are doing enough row access, another approach would be to save the offset of each line in the file the first time through that part of the file so that the next time you try to retrieve a row that you've already seen, you can jump right to that row. I'm thinking something like:
Or something like that. Kind of on-the-fly indexing. This code isn't tested, but the idea is there. Should give you some performance improvement at the cost of some memory. Or you could tie the offset array to a file to build a persistent index for future use.my @offsets; sub Set_line (\*\$\$) { my $fh_r=shift @_; my $cur_r=shift @_; my $row_r=shift @_; if ($offsets[$$row_r]){ # We know where the row is, so just go there. seek($ffh_r, $offsets[$$row_r], 0); } elsif ($$cur_r>$$row_r) { # We don't know where it is, so we should start at the end of the +area that we've indexed seek ($$fh_r, $offsets[-1],0); $$cur_r= scalar $offsets; } while (!(($$cur_r)==$$row_r) && $_ ne "") { $_=readline($fh_r); $$cur_r++; $offset[$$cur_r] = tell($$fh_r); } return; }
|
|---|