in reply to Re: opening a file in a subroutine
in thread opening a file in a subroutine

Nope.

This code suffers from the same bug as the other code.

Notice that he said perl 5.6 on Win2k.


Updated

Also tested other solutions, with 5.6 on win98, this yielded the best performance.

Please to show your benchmarks.

use strict; use warnings; use Benchmark 'cmpthese'; my %subs = ( list_io => sub { # Second time round this will take # a loooooong time. my $file=$0; my @line; open my($fh), $file; @line= <$fh>; close($fh); return \@line; }, split_slurp => sub { my $file=$0; my @line; local $/; open my($fh), $file; @line=split /\n/,<$fh>; close($fh); return \@line; }, while_io => sub { my $file=$0; my @line; open my($fh), $file; push @line,$_ while <$fh>; close($fh); return \@line; }, ); cmpthese -5,\%subs; __END__
Benchmark: running list_io, split_slurp, while_io, each for at least 5 + CPU seconds... list_io: 5 wallclock secs ( 3.89 usr + 1.34 sys = 5.24 CPU) @ 17 +21.98/s (n=9018) split_slurp: 4 wallclock secs ( 3.02 usr + 2.21 sys = 5.23 CPU) @ 2 +725.38/s (n=14251) while_io: 7 wallclock secs ( 3.64 usr + 1.38 sys = 5.02 CPU) @ 16 +29.26/s (n=8174) Rate while_io list_io split_slurp while_io 1629/s -- -5% -40% list_io 1722/s 6% -- -37% split_slurp 2725/s 67% 58% --
Don't trying to be confused, and think local $/, read in and then split would improve performance.

In light of the evidence I think you will have to reconsider.

(As I will explain, whether you use scalar context has not much to do with performance.

Depends what you mean. Reading in smaller chunks at a time reduces memory overhead and can thus have a signifigant effect on run time.

On the other hand, keep in mind, split is not free, it requires to walk thru the whole string. As anyone has a c background would know, string operation does hurt performance a lot, especially this kind of operation that invloves head to toes.)

You would think this on face glance. As I said the evidence contradicts you.

Perhaps its due to perl being able to allocate one buffer sysread the lot and then walk the string. It may in fact be that this is more efficient than reading whatever the standard size buffer is for PerlIO, scaning it for new lines, then reading another buffer.... (Assuming of course memory is available)

First layer, the physical reading layer, Perl would read in block by block, doesn't matter whether your code requires a scalar context or array context. This makes sense, it is optimized for Perl's own performance.

By block by block presumably you mean buffer by buffer.

Second layer, the layer between Perl and your program. Perl would present the data in the right context, as you required. This layer doesn't involve physical devices, and is much less related to performance than the first layer does.

You have the return type part of context mixed up with the actions that the context causes to happen. In list context the IO operator does something different to what it does in scalar context. In list context it causes the entire file to be eventually read into mememory and sliced up into chunks as specified by $/.

In scalar context it reads enough to provide a single chunk at a time. If the chunks are small it may read more chunks into memory than one, but it doesn't necessarily load them all.

I will agree that I was suprised myself however about these results. But you cant just say that something works the way it does, and that it should be faster, because you think so. Unless you have poured over the PerlIO code and unless you have benchmarked the issue in question rather exhaustively you have no way to know howfast something is going to run in Perl.

--- demerphq
my friends call me, usually because I'm late....

Replies are listed 'Best First'.
Re^3: opening a file in a subroutine
by pg (Canon) on Feb 09, 2003 at 22:54 UTC
    However I do clearly see a positive sign here, your idea is changing, it is good for everyone to improve themselves thru discussions, and change ideas. I do, you do, everyone does, that's why we all love this site!!
      The benchmark you did is misleading, and totally irrelevant to what I said.

      Please go back and actually read the benchmark code in the little readmore tag there.

      1. Your code, not exact, modified for clear comparison:

      while (<FILE>) { push @lines, $_; }

      This is a moderately less efficient version of the code in $subs{while_io} . The code in particular is

      push @_,$_ while <$fh>;
      2. My code,
      @lines = <FILE>;
      This is the code in $subs{list_io}. The code in particular is
      @line= <$fh>;
      They are two different animals, my version slurps the file, when your version does not. There is the performance difference.

      Yes correct. The code in $subs{split_slurp} is measurably faster than either of the above solutions (for files that fit in memory and on my machine here). In addition both methods you mention are benchmarked. The second on small files moderately out performs the first. And split_slurp outperforms them both in similar conditions. Only while_io is suitable for really large files. And as the other posts mention, at least on Win32 Perl 5.6 list_io hangs on large files.

      Your benchmark data is not relevant to my solution at all.

      Your solution (insofar as <> in list context can be called your solution) is benchmarked along side two others. It is slower than one of the others according to the benchmark. What more is there to be said?

      Benchmark: running list_io, split_slurp, while_io, each for at least 5 + CPU seconds... Rate while_io list_io split_slurp while_io 1629/s -- -5% -40% list_io 1722/s 6% -- -37% split_slurp 2725/s 67% 58% --
      Update

      Please don't add stuff to nodes and not put an update note on.

      As for the size/slurp, my slurp is localized. It wont infect the rest of a script, furthermore the entire way through this discussion we have been discussing files that can be processed in memory Ie the 8.5MB word file, or $0 of the benchmark itself. The other use was a solution to do something in someones code where the effects of the localization, and reading in such a large file and then evaling it (!!?) practically required a notice that it was a bad solution. Here matters are much more constrained, and as such slurping isn't out of line. Consider the OP is using it happily right now (apparently).

      Update 2

      The original node there has been completely rewritten three times.

      Thats way out of line dude. The community here is about sharing knowledge and having a good time. I posted a benchmark. You updated your node with a bunch of stuff that demonstratably not true in a least one instance. I called you on it. Fine. Somebody out there might learn something from this exchange. Get over it, and don't rewrite history because you made a mistake. If everyone did that the archives would be full of holes. Hell theres a few nodes I wish I didnt write. They are still here.

      Anyway, since your rewriting history today this conversation is over.

      --- demerphq
      my friends call me, usually because I'm late....