Plankton has asked for the wisdom of the Perl Monks concerning the following question:

I have come up with a way to display the progress of the downloading of a file, but it requires resorting to shell script instead of Perl; a technique I often receive down votes for when I suggest it to others. I am assuming these down votes are deserved and therefore I am reaching out for advice on how to implement a "download progress display" in a cgi script without sorting to shell scripting. Here's the bit of code I use to get the initial size of the download. I couldn't figure out get this key piece of information via WWW::Mechanize or LWP. In this snippet I resort to "shelling out" to wget which will tell me the size of the ZIP file I want to download. I couldn't figure out how to do this in pure Perl.
if ( $download_status eq "START" ) { my $wgetcmd = "wget -o $download_log -O $zipfile -b $zipurl"; system ( $wgetcmd ) == 0 or die __FILE__ . " [" . __LINE__ . "] cannot execute $wgetcmd : +$!\n"; sleep 1; my $shell_out = <<`SHELL`; grep Length $download_log | awk '{print \$3}' SHELL $download_size = $shell_out; }
Later on in the script I set a cookie to maintain the state of the download. If the download is not complete I simply "shell out" to grep awk and tail of the wget log file to get the status of the download. I hope there is a Monk that can help with my bad habits.

Replies are listed 'Best First'.
Re: Help keep me from "shelling out" - download progress bar in cgi
by jwkrahn (Abbot) on Jun 15, 2009 at 04:42 UTC

    From LWP::Simple:

    head($url) Get document headers. Returns the following 5 values if succ +essful: ($content_type, $document_length, $modified_time, $expires, $ +server)

    The second argument returned ($document_length) tells you the size of the file.

      Thanks! I am already using WWW::Mechanize. I would think my script would look funny if it had ...
      use WWW::Mechanize; use LWP::Simple
      ... a minor point, but do you know of a way to get the $document_lenght using only WWW::Mechanize? Also I need to be able to "fork/background" the process that is carrying out the download and I also need to be able to query that process about the progress of the download. You have already be very helpful and I thank you for that. I hope it is not too much to ask if you can answer these additional questions.
        From the WWW::Mechanize docs:
        WWW::Mechanize is a proper subclass of LWP::UserAgent and you can also use any of LWP::UserAgent's methods.
        So
        use WWW::Mechanize;
        will do.
        I would think my script would look funny if it had ...
        use WWW::Mechanize; use LWP::Simple
        Nowhere, not even in "Perl Best Practices", is there any guideline that restrict you to a maximum number of modules you are allowed to use.

        Go ahead, the Perl Style Police will not come and arrest you!

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        #!/usr/bin/perl -- use strict; use warnings; use WWW::Mechanize 1.54; my $ua = WWW::Mechanize->new; my $uri = URI->new('http://cpan.org/'); warn $ua->head($uri)->header('content-length'); warn $ua->res->header('content-length'); $ua->show_progress(1); $ua->get($uri); warn $ua->res->header('content-length'); warn $ua->title; #use DDS;warn Dump($ua); __END__
        5810 at test.pl line 10. 5810 at test.pl line 11. ** GET http://cpan.org/ ==> 13% 21% 46% 71%100%20 +0 OK 5810 at test.pl line 14. CPAN at test.pl line 15.

        Sorry, I've never used WWW::Mechanize and I don't know how to get the file size via this module.

      Note that not all HTTP servers send the Content-Length header that is returned by head(). Most servers do, and a lot even do so if the actual content is generated dynamically. But some servers don't, especially for dynamic content. In that case, $document_length is undefined.

      You can see this e.g. when downloading a file in Firefox: It always shows you the number of bytes downloaded so far and the time elapsed, and most of the times also the overall size of the download and the estimated remaining time. This happens when a Content-Length header was sent. Sometimes, Firefox does not tell you anything about the total size or the remaining time, simply because it does not know: There was no Content-Length header in the response.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        most CGI programs don't bother with content_length/range, even when they should (when used for download files).
Re: Help keep me from "shelling out" - download progress bar in cgi
by McDarren (Abbot) on Jun 15, 2009 at 06:29 UTC
Re: Help keep me from "shelling out" - download progress bar in cgi
by CountZero (Bishop) on Jun 15, 2009 at 06:16 UTC
    Cannot you do a
    my $length = stat($filename)[7];
    on the file?

    Or

    use File::stat; my $length = stat($filename)->size;
    if you cannot remember the indexes of the stat function?

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James