in reply to Re: Re: Re: opening a file in a subroutine
in thread opening a file in a subroutine

Hello Demerphq,

The code which I have posted is quite close to what I am doing. my appliction has couple of log files. I tried to write a generic fuction which will open a log file and return all its contents. However I realized my application slows down a lot when the second call to the log reader function is made.

Then I wrote this dummy program. The function here and the one in my application are the same.

I tried to optimize the code by passing a reference to the @lines array to the function and using the same reference everywhere. but that did not help. My machine has 512 MB physical ram and 756 MB swap pagefile. So I don't think that hardware or swap is an issue. Just to check if the OS is s*wing up, I wrote the same function in C#. There all the 3 function calls took 5 seconds each. So why is it in perl that the second and third calls take so long? I am still confused.

my updated code looks like
use strict; use warnings; print "opening file 1\n"; my @lines; myfile(\@lines); @lines = {}; print "opening file 2\n"; myfile(\@lines); @lines = {}; print "opening file 3\n"; myfile(\@lines); @lines = {}; sub myfile { my ($line) = @_; open my $fh, "xml.log"; @{$line}= <$fh>; close($fh); }
regards, Abhishek.

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: opening a file in a subroutine
by demerphq (Chancellor) on Feb 09, 2003 at 16:43 UTC
    *blushes*

    The first and the second times are to be expected to be different. The third should be the same as the second. The reason for the difference is that on the first time all it has to do is allocate the space for the array and the strings. The second and third trys have to deallocate first and then suffer the same overhead.

    However, I believe you have stumbled on a bug. When I change the context to scalar the times are predicatable. For instance your code on an 8.5MB file it takes 13 seconds for the first try and more minutes than I am willing to wait on the next two. When i replace it with a slurp and a plit it becomes 2 seconds and 5 seconds. When i replace it with a while (<$fh>) it goes to 4 and 10. Heres the code:

    use strict; use warnings; $|++; my @lines; for my $try (1..3) { print "Try $try\n"; my $start=time; @lines=(); # 6 million ways to die, choose one: # my_slurp(\@lines); # my_while(\@lines); # my_file(\@lines); print "Took ".(time-$start)." seconds.\n"; } sub my_file { my ($line) = @_; open my $fh, "D:\\Perl\\DevLib\\Scrabble\\dict.txt"; @{$line}= <$fh>; close($fh); } sub my_slurp { my ($line) = @_; local $/; open my $fh, "D:\\Perl\\DevLib\\Scrabble\\dict.txt"; @{$line}=split /\n/,<$fh>; close($fh); } sub my_while { my ($line) = @_; open my $fh, "D:\\Perl\\DevLib\\Scrabble\\dict.txt"; push @$line,$_ while <$fh>; close($fh); } __END__ my_file: Try 1 Took 13 seconds. Try 2 ^C (I killed it after 5 minutes.) my_while: Try 1 Took 4 seconds. Try 2 Took 10 seconds. Try 3 Took 9 seconds. my_slurp: Try 1 Took 2 seconds. Try 2 Took 6 seconds. Try 3 Took 5 seconds.
    Hopefully someone with 5.8 or bleadperl can check if this is a resolved bug, if not it definately should be reported.

    Sorry I didnt try your code properly on my first reply. :-)

    Dont forget the resolution on the timings is at least +-1 second.

    Also, have you considered that maybe there is a better strategy than reading the whole file in three times? Do you have to read it all in at once? Why cant you just copy the data once its read? This doesnt resolve the bug, but it sounds like design improvements are not out of order.

    --- demerphq
    my friends call me, usually because I'm late....

      WOOOOOOWWWW !!!!!

      the call local $/; did the magic!!! as soon as I put that into my function my code is flying now.

      Thank you so much for the slurp function .... Now its really really fast!!!

      BTW, what did it do? I don't understand the meaining of local $/?

      regards,
      Abhishek.
        Er, hold on there... Dont be too hasty.

        $/ is the end of record seperator. It defaults to "\n".

        What my code, which is in essence

        my @list=split /\n/,do {local $/; <$fh>};
        does is to read the whole file into a buffer then split it up by newlines and then return them, which should be more or less the same as
        my @list=<$fh>;
        (at least under default conditions) but obviously isn't on AS 5.6.

        I hope you didnt do it naively as otherwise youll only have one entry in the array, which will contain the whole file.

        --- demerphq
        my friends call me, usually because I'm late....

      OK, I tested this with v5.8.0 built for i386-linux-thread-multi.

      Test data was a 32M, 2.5 million line file.
      Machine is a 1740Mhz Althon, 512Mb real memory, 768Mb swap.

      Results:

      my_slurp: Try 1 Took 11 seconds. Try 2 Took 16 seconds. Try 3 Took 15 seconds. my_while: Try 1 Took 4 seconds. Try 2 Took 5 seconds. Try 3 Took 4 seconds. my_file: Try 1 Took 33 seconds. Try 2 Took 19 seconds. Try 3 Took 14 seconds.

      If the information in this post is inaccurate, or just plain wrong, don't just downvote - please post explaining what's wrong.
      That way everyone learns.

Re5: opening a file in a subroutine
by Hofmator (Curate) on Feb 09, 2003 at 16:57 UTC
    Sorry, I can't help you directly with your issue at hand - your program works fine for me, even on bigger files. But allow me to give two remarks:
    • @lines = {}; assigns an empty hashreference to the array @lines. What you want is probably @lines = (); (with parentheses and not curly braces) which clears the array.
    • Your sub is written in a strange way. Normally you would return a reference to a lexically created array like this - instead of passing in an arrayref and modifiying that:
      sub myfile { my $file = shift; open my $fh, $file or die "Couldn't open '$file': $!"; my @lines = <$fh>; close $fh or die "Couldn't close '$file': $!"; return \@lines; } # and call it like this: my @lines = @{myfile('xml.log')};

    Update: If it's a bug as demerphq suspects, then I suspect that it's OS dependent - I get the same times here on all opens (perl v5.6.0 built for sun4solaris).

    -- Hofmator