in reply to Re: Reading from the end of a file.
in thread Reading from the end of a file.

Re: Your second thought.

It's a fair point, but the stats show comparative differences which means that only the first pass of the file is penalised, as the file will then be in the cache for all subsequent passes.

In the case of the figures shown, the case affected was File::ReadBackwards (by virtue of Benchmark running the testcases in alpha sorted order by name). As File::ReadBackwards managed to process the file at least 700+ times in the alloted 3 seconds of cpu, regardless of the filesize, the affect of the penalty for putting the file into the cache on the first pass is minimal. However, to preclude the possibility of any affect, I added the following line at the top of the for loop

( undef ) = do{ local $/; open my $fh, '< :raw', $file or die $!; <$fh> };

so as to preload the cache. The results of the re-run were nearly identical--certainly within the bounds of normal variance.

P:\test>354830 Comparing data/500k.dat Rate Tie::File readfwd File::ReadBackwards + rawio Tie::File 5.15/s -- -95% -99% + -100% readfwd 94.4/s 1734% -- -88% + -99% File::ReadBackwards 783/s 15101% 729% -- + -95% rawio 15058/s 292394% 15852% 1824% + -- Comparing data/1000k.dat Rate Tie::File readfwd File::ReadBackwards + rawio Tie::File 2.50/s -- -94% -100% + -100% readfwd 43.7/s 1650% -- -95% + -100% File::ReadBackwards 871/s 34777% 1893% -- + -94% rawio 14917/s 597126% 34023% 1612% + -- Comparing data/2MB.dat Rate Tie::File readfwd File::ReadBackwards + rawio Tie::File 1.24/s -- -95% -100% + -100% readfwd 23.6/s 1797% -- -98% + -100% File::ReadBackwards 1051/s 84542% 4363% -- + -93% rawio 14894/s 1198858% 63119% 1316% + --

There's no real magic about why the differences are so great. Tie::File and the readfwd cases are having to read the entire file to get the last line. Additionally, Tie::File is doing a huge amount of work under the covers with buffering the whole file through a limited buffer space and a hash. This extra work is incredibly useful when you are using it for the purposes for which it was designed, but this is the wrong purpose.

File::ReadBackwards skips to the end of the file and (unsurprisingly:) reads backwards in a similar fashion to the rawio case, but it carries the overhead of tie. It also is properly coded to handle the IO in a cross platform manner and handle any length of line rather than relying on a hardcode maximum line length and assuming that "\n" will do the 'right thing' as my crude rawio case does.

For production work where performance wasn't the ultimate criteria, I would use File::ReadBackwards in preference to trying to fix up the rawio case.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail

Replies are listed 'Best First'.
Re: Reading from the end of a file.
by Abigail-II (Bishop) on May 20, 2004 at 23:05 UTC
    There's no need to do any "prerunning" to avoid penalties for a first run. If the first argument to timethese (and hence, to cmpthese) is negative, Benchmark will run the code for at least that number of seconds. But in order to know how many times the code needs to be run, it will first run the code several times to get an indication how often it needs to run to satisfy the requirement. So, any first run penalties have already been paid.

    Of course, if there's a significant difference between a first run and any subsequent runs, the use of the Benchmark module is isn't very useful anyway.

    Abigail

      That's pretty much what I thought, though not in the detail. I had a vague idea that I read somewhere that it always ran the code at least once plus an 'empty loop' in order that it might eliminate it's own overhead from the timings, but adding the preload just made sure.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail