comment on

It seems to me that you are taking the fastest approach possible, given the assumption that the rows in your file are of varying lengths. If the rows were the same length, then it would be very easy (and quick) to get the point in the file that you want to read.

It seems that you are performing multiple reads from this large file, otherwise the performance wouldn't really be an issue. Is there any way that you can batch up your reads so that you can leverage the scan through the file? For example, reading rows 1, 2, 4 and 16 by grabbing rows 1, 2 and 4 on the way to getting 16 will be faster than reading each of the rows individually.

If you have the memory and are doing enough row access, another approach would be to save the offset of each line in the file the first time through that part of the file so that the next time you try to retrieve a row that you've already seen, you can jump right to that row. I'm thinking something like:

my @offsets;

sub Set_line (\*\$\$) {
  my $fh_r=shift @_;
  my $cur_r=shift @_;
  my $row_r=shift @_;

  if ($offsets[$$row_r]){
    # We know where the row is, so just go there.
    seek($ffh_r, $offsets[$$row_r], 0);
  } elsif ($$cur_r>$$row_r) {
    # We don't know where it is, so we should start at the end of the 
+area that we've indexed
    seek ($$fh_r, $offsets[-1],0);
    $$cur_r= scalar $offsets;
  }
  while (!(($$cur_r)==$$row_r) && $_ ne "") {
    $_=readline($fh_r);
    $$cur_r++;
    $offset[$$cur_r] = tell($$fh_r);
  }
  return;
}
[download]

Or something like that. Kind of on-the-fly indexing. This code isn't tested, but the idea is there. Should give you some performance improvement at the cost of some memory. Or you could tie the offset array to a file to build a persistent index for future use.

In reply to Re: Fast way to read from file by bschmer
in thread Fast way to read from file by Hena

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.