in reply to Re: opening a file in a subroutine
in thread opening a file in a subroutine

thanks for your reply.

The file is only one 1.5 MB. the first function gets completes within a matter of seconds. Then the second function call takes a very long time to complete. I was wrong with I said that the program hangs... but the second and the third function calls take 5 minutes each to complete. This doesn't make sense to me... because if the first call took just 5 - 6 seconds why did the second and third take such a long amount of time?

Replies are listed 'Best First'.
Re: Re: Re: opening a file in a subroutine
by demerphq (Chancellor) on Feb 09, 2003 at 14:58 UTC
    My guess is that the code that you are actually using looks more like
    sub read_file { my $file=shift; open my $fh,$file or die "$file : $!"; my @lines=<$fh>; return \@lines; }
    In which case its storing the data in memory. Which is perhaps overflowing your available physical ram. When that happens the OS starts swapping the memory out to disk (obviously a slow operation), which if it happens enough leads to a condition called thrashing where the OS is basically just moving memory back and forth from the disk, and the time taken for the repeated swapping completely overwhelms the time your code takes. Causing your code to look like it hangs. Try using some memory monitoring tool, or do some program analyiss to see exactly how much memory you are actually using. Hashes for instance consume far far more memory than they look. As does careless array manipulation. For instance if you have
    my @array; $array[100_000_000]=1;
    then perl will have to allocate sufficient RAM for all 100 million slots, even though they arent used.

    Without seeing real code, (as i said I dont think what you posted properly reflects what you are doing) its hard to say for certain.

    Regards,

    UPDATE: Since you are on Win2k you should take advantage of the Task manager and the Profiler that come with the OS. (Start -> Settings -> Control Panel -> Administrative Tools -> Profiler)

    --- demerphq
    my friends call me, usually because I'm late....

      Hello Demerphq,

      The code which I have posted is quite close to what I am doing. my appliction has couple of log files. I tried to write a generic fuction which will open a log file and return all its contents. However I realized my application slows down a lot when the second call to the log reader function is made.

      Then I wrote this dummy program. The function here and the one in my application are the same.

      I tried to optimize the code by passing a reference to the @lines array to the function and using the same reference everywhere. but that did not help. My machine has 512 MB physical ram and 756 MB swap pagefile. So I don't think that hardware or swap is an issue. Just to check if the OS is s*wing up, I wrote the same function in C#. There all the 3 function calls took 5 seconds each. So why is it in perl that the second and third calls take so long? I am still confused.

      my updated code looks like
      use strict; use warnings; print "opening file 1\n"; my @lines; myfile(\@lines); @lines = {}; print "opening file 2\n"; myfile(\@lines); @lines = {}; print "opening file 3\n"; myfile(\@lines); @lines = {}; sub myfile { my ($line) = @_; open my $fh, "xml.log"; @{$line}= <$fh>; close($fh); }
      regards, Abhishek.
        *blushes*

        The first and the second times are to be expected to be different. The third should be the same as the second. The reason for the difference is that on the first time all it has to do is allocate the space for the array and the strings. The second and third trys have to deallocate first and then suffer the same overhead.

        However, I believe you have stumbled on a bug. When I change the context to scalar the times are predicatable. For instance your code on an 8.5MB file it takes 13 seconds for the first try and more minutes than I am willing to wait on the next two. When i replace it with a slurp and a plit it becomes 2 seconds and 5 seconds. When i replace it with a while (<$fh>) it goes to 4 and 10. Heres the code:

        use strict; use warnings; $|++; my @lines; for my $try (1..3) { print "Try $try\n"; my $start=time; @lines=(); # 6 million ways to die, choose one: # my_slurp(\@lines); # my_while(\@lines); # my_file(\@lines); print "Took ".(time-$start)." seconds.\n"; } sub my_file { my ($line) = @_; open my $fh, "D:\\Perl\\DevLib\\Scrabble\\dict.txt"; @{$line}= <$fh>; close($fh); } sub my_slurp { my ($line) = @_; local $/; open my $fh, "D:\\Perl\\DevLib\\Scrabble\\dict.txt"; @{$line}=split /\n/,<$fh>; close($fh); } sub my_while { my ($line) = @_; open my $fh, "D:\\Perl\\DevLib\\Scrabble\\dict.txt"; push @$line,$_ while <$fh>; close($fh); } __END__ my_file: Try 1 Took 13 seconds. Try 2 ^C (I killed it after 5 minutes.) my_while: Try 1 Took 4 seconds. Try 2 Took 10 seconds. Try 3 Took 9 seconds. my_slurp: Try 1 Took 2 seconds. Try 2 Took 6 seconds. Try 3 Took 5 seconds.
        Hopefully someone with 5.8 or bleadperl can check if this is a resolved bug, if not it definately should be reported.

        Sorry I didnt try your code properly on my first reply. :-)

        Dont forget the resolution on the timings is at least +-1 second.

        Also, have you considered that maybe there is a better strategy than reading the whole file in three times? Do you have to read it all in at once? Why cant you just copy the data once its read? This doesnt resolve the bug, but it sounds like design improvements are not out of order.

        --- demerphq
        my friends call me, usually because I'm late....

        Sorry, I can't help you directly with your issue at hand - your program works fine for me, even on bigger files. But allow me to give two remarks:
        • @lines = {}; assigns an empty hashreference to the array @lines. What you want is probably @lines = (); (with parentheses and not curly braces) which clears the array.
        • Your sub is written in a strange way. Normally you would return a reference to a lexically created array like this - instead of passing in an arrayref and modifiying that:
          sub myfile { my $file = shift; open my $fh, $file or die "Couldn't open '$file': $!"; my @lines = <$fh>; close $fh or die "Couldn't close '$file': $!"; return \@lines; } # and call it like this: my @lines = @{myfile('xml.log')};

        Update: If it's a bug as demerphq suspects, then I suspect that it's OS dependent - I get the same times here on all opens (perl v5.6.0 built for sun4solaris).

        -- Hofmator