in reply to Re: Re: opening a file in a subroutine
in thread opening a file in a subroutine

My guess is that the code that you are actually using looks more like
sub read_file { my $file=shift; open my $fh,$file or die "$file : $!"; my @lines=<$fh>; return \@lines; }
In which case its storing the data in memory. Which is perhaps overflowing your available physical ram. When that happens the OS starts swapping the memory out to disk (obviously a slow operation), which if it happens enough leads to a condition called thrashing where the OS is basically just moving memory back and forth from the disk, and the time taken for the repeated swapping completely overwhelms the time your code takes. Causing your code to look like it hangs. Try using some memory monitoring tool, or do some program analyiss to see exactly how much memory you are actually using. Hashes for instance consume far far more memory than they look. As does careless array manipulation. For instance if you have
my @array; $array[100_000_000]=1;
then perl will have to allocate sufficient RAM for all 100 million slots, even though they arent used.

Without seeing real code, (as i said I dont think what you posted properly reflects what you are doing) its hard to say for certain.

Regards,

UPDATE: Since you are on Win2k you should take advantage of the Task manager and the Profiler that come with the OS. (Start -> Settings -> Control Panel -> Administrative Tools -> Profiler)

--- demerphq
my friends call me, usually because I'm late....

Replies are listed 'Best First'.
Re: Re: Re: Re: opening a file in a subroutine
by abhishes (Friar) on Feb 09, 2003 at 16:06 UTC
    Hello Demerphq,

    The code which I have posted is quite close to what I am doing. my appliction has couple of log files. I tried to write a generic fuction which will open a log file and return all its contents. However I realized my application slows down a lot when the second call to the log reader function is made.

    Then I wrote this dummy program. The function here and the one in my application are the same.

    I tried to optimize the code by passing a reference to the @lines array to the function and using the same reference everywhere. but that did not help. My machine has 512 MB physical ram and 756 MB swap pagefile. So I don't think that hardware or swap is an issue. Just to check if the OS is s*wing up, I wrote the same function in C#. There all the 3 function calls took 5 seconds each. So why is it in perl that the second and third calls take so long? I am still confused.

    my updated code looks like
    use strict; use warnings; print "opening file 1\n"; my @lines; myfile(\@lines); @lines = {}; print "opening file 2\n"; myfile(\@lines); @lines = {}; print "opening file 3\n"; myfile(\@lines); @lines = {}; sub myfile { my ($line) = @_; open my $fh, "xml.log"; @{$line}= <$fh>; close($fh); }
    regards, Abhishek.
      *blushes*

      The first and the second times are to be expected to be different. The third should be the same as the second. The reason for the difference is that on the first time all it has to do is allocate the space for the array and the strings. The second and third trys have to deallocate first and then suffer the same overhead.

      However, I believe you have stumbled on a bug. When I change the context to scalar the times are predicatable. For instance your code on an 8.5MB file it takes 13 seconds for the first try and more minutes than I am willing to wait on the next two. When i replace it with a slurp and a plit it becomes 2 seconds and 5 seconds. When i replace it with a while (<$fh>) it goes to 4 and 10. Heres the code:

      use strict; use warnings; $|++; my @lines; for my $try (1..3) { print "Try $try\n"; my $start=time; @lines=(); # 6 million ways to die, choose one: # my_slurp(\@lines); # my_while(\@lines); # my_file(\@lines); print "Took ".(time-$start)." seconds.\n"; } sub my_file { my ($line) = @_; open my $fh, "D:\\Perl\\DevLib\\Scrabble\\dict.txt"; @{$line}= <$fh>; close($fh); } sub my_slurp { my ($line) = @_; local $/; open my $fh, "D:\\Perl\\DevLib\\Scrabble\\dict.txt"; @{$line}=split /\n/,<$fh>; close($fh); } sub my_while { my ($line) = @_; open my $fh, "D:\\Perl\\DevLib\\Scrabble\\dict.txt"; push @$line,$_ while <$fh>; close($fh); } __END__ my_file: Try 1 Took 13 seconds. Try 2 ^C (I killed it after 5 minutes.) my_while: Try 1 Took 4 seconds. Try 2 Took 10 seconds. Try 3 Took 9 seconds. my_slurp: Try 1 Took 2 seconds. Try 2 Took 6 seconds. Try 3 Took 5 seconds.
      Hopefully someone with 5.8 or bleadperl can check if this is a resolved bug, if not it definately should be reported.

      Sorry I didnt try your code properly on my first reply. :-)

      Dont forget the resolution on the timings is at least +-1 second.

      Also, have you considered that maybe there is a better strategy than reading the whole file in three times? Do you have to read it all in at once? Why cant you just copy the data once its read? This doesnt resolve the bug, but it sounds like design improvements are not out of order.

      --- demerphq
      my friends call me, usually because I'm late....

        WOOOOOOWWWW !!!!!

        the call local $/; did the magic!!! as soon as I put that into my function my code is flying now.

        Thank you so much for the slurp function .... Now its really really fast!!!

        BTW, what did it do? I don't understand the meaining of local $/?

        regards,
        Abhishek.

        OK, I tested this with v5.8.0 built for i386-linux-thread-multi.

        Test data was a 32M, 2.5 million line file.
        Machine is a 1740Mhz Althon, 512Mb real memory, 768Mb swap.

        Results:

        my_slurp: Try 1 Took 11 seconds. Try 2 Took 16 seconds. Try 3 Took 15 seconds. my_while: Try 1 Took 4 seconds. Try 2 Took 5 seconds. Try 3 Took 4 seconds. my_file: Try 1 Took 33 seconds. Try 2 Took 19 seconds. Try 3 Took 14 seconds.

        If the information in this post is inaccurate, or just plain wrong, don't just downvote - please post explaining what's wrong.
        That way everyone learns.

      Sorry, I can't help you directly with your issue at hand - your program works fine for me, even on bigger files. But allow me to give two remarks:
      • @lines = {}; assigns an empty hashreference to the array @lines. What you want is probably @lines = (); (with parentheses and not curly braces) which clears the array.
      • Your sub is written in a strange way. Normally you would return a reference to a lexically created array like this - instead of passing in an arrayref and modifiying that:
        sub myfile { my $file = shift; open my $fh, $file or die "Couldn't open '$file': $!"; my @lines = <$fh>; close $fh or die "Couldn't close '$file': $!"; return \@lines; } # and call it like this: my @lines = @{myfile('xml.log')};

      Update: If it's a bug as demerphq suspects, then I suspect that it's OS dependent - I get the same times here on all opens (perl v5.6.0 built for sun4solaris).

      -- Hofmator