punch_card_don has asked for the wisdom of the Perl Monks concerning the following question:

Madcap Monks,

I often write small scripts to do simple processing to large files (~1Gb and ~18-million lines). These scripts are launched from a link on an admin web page, so I usually include a print statement to show progress.

Running the script below, I expected to see ", $respondent" fill my screen repeatedly as the script went through the file. Instead, the browser just sat there until it timed out and gave me its default timeout message "Server Error". I know, by looking at the files in questin, that the script is still running and doing its job.

Why do the prints not print in real time?

Thanks.

open(FILE, $data_file) or die("cannot open file : $!"); print "<p>opened file $data_file ok\n"; $a = 0; while ($line = <FILE>) { $a++; #record every thousand lines as way to check progress if ($a % 1000 == 0) { open(pFILE, '>>progress.txt') or die("cannot open progress + file : $!"); print pFILE ", $a"; close(pFILE); } #parse the line @line = split(//, $line); my $line_id = join('', $line[6], $line[7], $line[8]); if ($line_id eq $target_line_id) { my $respondent = join('', $line[0], $line[1], $line[2], $l +ine[3]); # if this line meets the criteria if (($line[$col_1 - 1] eq $value_1) && ($line[$col_2 - 1] +eq $value_2)) { $file = ">>".$data_dir."/t".$rid; #record this respondent in a text file open(rFILE, $file) or dienice("cannot open file : $_[0 +] $!"); print rFILE "$respondent\n"; close(rFILE); # output progress to browser print ", $respondent"; } } } close(FILE);

Forget that fear of gravity,
Get a little savagery in your life.

Replies are listed 'Best First'.
Re: Why is this script not giving real-time output to browser?
by Anonymous Monk on Feb 10, 2005 at 17:46 UTC
    Buffering.

    First of all, your program is buffering its output to the server. You can disable this by setting $| to a true value. But even if you have turned off buffering of your program - your server might buffer. Or a proxy somewhere.

Re: Why is this script not giving real-time output to browser?
by eieio (Pilgrim) on Feb 10, 2005 at 17:51 UTC
    In addition to disabling output buffering ($|++;), you may find that some browsers don't display the page until the script has completed. In addition, you may find that some browsers do display the partial page as it is generated but a timeout occurs before the script has completed. In these cases, a solution based on this article by merlyn may be helpful.

      I would reccomend explicitly setting $| rather than using an increment operator on it

Re: Why is this script not giving real-time output to browser?
by saintmike (Vicar) on Feb 10, 2005 at 17:49 UTC
Re: Why is this script not giving real-time output to browser?
by FitTrend (Pilgrim) on Feb 10, 2005 at 17:54 UTC

    It sounds like you still have buffering on. Have you turned off buffering using

    $| = 1;

    I did a quick search on the internet and this URL should be useful.

    Unless I'm missing something obvious, this should help.
Re: Why is this script not giving real-time output to browser?
by punch_card_don (Curate) on Feb 10, 2005 at 19:09 UTC
    You learn something new everyday.

    $| = 1; caused it to spit out updates in real time. Thanks. Maybe just a paranoid impression, but it seems to run slower like this, but not surprising.

    Forget that fear of gravity,
    Get a little savagery in your life.

      Yes, it will run more slowly. Buffering is an optimization. It's almost always a good thing.

      If you really need to send data as soon as possible (because the time it takes to generate the data is much longer than the time you lose waiting to send optimally-sized network packets with all of the overhead and latency there), disabling buffering at the right place and for the right amount of time is a worthwhile tradeoff.

      Don't make the mistake too many people here do, though, in thinking that buffering is always a problem. It's not. It's on for very good reasons. They don't always apply, but it helps far more often than it hurts, in my experience.

      Disabling buffering globally by default is a code smell, in my opinion.

Re: Why is this script not giving real-time output to browser?
by RazorbladeBidet (Friar) on Feb 10, 2005 at 18:12 UTC
    Are you printing the HTML header?

    print "Content-type:text/html\n\n"


    Just thought I'd check in case you hadn't
      That would certainly be one of those typical "oh, sh*t!" things, but, yes, I'm printing the html header earlier in the code. Thanks.

      Forget that fear of gravity,
      Get a little savagery in your life.