pawaniitd has asked for the wisdom of the Perl Monks concerning the following question:

I have a list of URLs of pdf files that i want to download, from different sites.

In my firefox i have chosen the option to save PDF files directly to a particular folder.

My plan was to use WWW::Mechanize::Firefox in perl to download each file (in the list - one by one) using firefox and renaming the file after download.

I used the following code to do it :

use WWW::Mechanize::Firefox; use File::Copy; # @list contains the list of links to pdf files foreach $x (@list) { my $mech = WWW::Mechanize::Firefox->new(autoclose => 1); #This downloads the file using firefox in desired folder $mech->get($x); opendir(DIR, "output/download"); @FILES= readdir(DIR); my $old = "output/download/$FILES[2]"; move ($old, $new); # $new is the URL of the new filename }

When i run the file, it opens the first link in firefox and firefox downloads the file to the desired directory. But, after that the 'new tab' is not closed and the file does not get renamed and the code keeps running (like its encountered an endless loop) and no futher file gets downloaded.

What is going on here? Why isnt the code working? How do i close the tab and make the code read all the files in the list? Is there any alternate way to download?

Replies are listed 'Best First'.
Re: Downloads in firefox using perl WWW::Mechanize::Firefox
by kcott (Archbishop) on Mar 11, 2014 at 10:59 UTC

    G'day pawaniitd,

    Welcome to the monastery.

    This line:

    my $old = "output/download/$FILES[2]";

    looks very wrong to me. $FILES[2] will be the third directory entry read from "output/download". Are you sure that's what you want?

    I suggest you change the last line in your foreach block to:

    #move ($old, $new); # $new is the URL of the new filename print "old='$old'; new='$new'\n";

    When you see each line of output contains "new=''", add

    use strict;

    to the top of your script. It will tell you about this and other problems you may have. You'll also benefit from adding:

    use warnings;

    to find out about other types of problems you didn't notice.

    The next issue, now that your script compiles, will (probably) be "old='old_filename'" having the same value in every iteration of the loop. I suspect you'll need to rethink how you set $old.

    [I don't use Firefox so I can't help with your tabbing issues.]

    -- Ken

      Hello Ken,

      The $FILES[2] is actually the first file in the directory as the first 2 elements of that array are '.' and '..'. I have tested it separately and it works.

      When i add a print statement after the $mech->get() function, i do not get any output.

      The use of :

       use strict;  use warnings;

      does not give any error.

      The $old is set by getting the filename of the file that has been downloaded, so it is not the same value.

        Then can you show us the complete program? Just using what you have provided, we are forced to guess you wrote the wrong program.
        "The $FILES[2] is actually the first file in the directory as the first 2 elements of that array are '.' and '..'. I have tested it separately and it works."

        Well, I wrote "[it] will be the third directory entry", so retorting with "[it] is actually the first file" comes across as purely argumentative. On a *nix system, '.' and '..' are files. Perhaps tell us what OS you're using.

        Semantics aside, you're assuming that $FILES[2] will hold the correct filename for $old. Is that a reasonable assumption for the entire life of your script? How many people have access to "output/download" (including yourself)? Is it at all possible that any of those people could modify (even inadvertently) the contents of "output/download"? Is there a better way to generate the value for $old (perhaps based on the name of file currently being downloaded and a modification time)?

        "When i add a print statement after the $mech->get() function, i do not get any output."

        How is that relevant in a reply to my post? Assumimg it is somehow relevant, just stating "i add a print statement" is entirely insufficient information. What is this print statement?

        "The use of : use strict; use warnings; does not give any error."

        Copying the code you posted (verbatim) to a file (pm_example.pl) and adding use strict; use warnings;, perl -c pm_example.pl gives me

        $ perl -c pm_example.pl Global symbol "$x" requires explicit package name at pm_example.pl lin +e 10. Global symbol "@list" requires explicit package name at pm_example.pl +line 10. Global symbol "$x" requires explicit package name at pm_example.pl lin +e 14. Global symbol "@FILES" requires explicit package name at pm_example.pl + line 17. Global symbol "@FILES" requires explicit package name at pm_example.pl + line 18. Global symbol "$new" requires explicit package name at pm_example.pl l +ine 19. pm_example.pl had compilation errors.

        While you may have my @list before the loop (but forgot to show it); within the scope of the loop, the other variables listed in that perl -c output should be declared as lexical variables (probably using my).

        As ++robby_dobby points out, when you fail to show your code we can only guess at what it might be. Given this is your first posting here, please familiarise yourself with the guidelines in "How do I post a question effectively?".

        "The $old is set by getting the filename of the file that has been downloaded, so it is not the same value."

        Incorrect. As I pointed out above, $old is based on the value of $FILES[2] (which has no guaranteed correlation with the download file).

        -- Ken

Re: Downloads in firefox using perl WWW::Mechanize::Firefox
by robby_dobby (Hermit) on Mar 11, 2014 at 10:05 UTC
    Hello pawaniitd,

    Welcome to the monastery. Did you have a look at the manual to see if you missed anything? YOu want a new tab to be opened everytime you invoke the new constructor. Have a look at the list of options detailed in the manual here - Do they tell you anything? :-)

    PS: Skills are gained not by running out at the first failure, but by patience and tenacity. Read manuals, code - gather as much as you can. Once again, welcome and have fun!

      Hello robby_dobby,

      I have already read the manual and have been trying various options to get over the problem. The problem is not opening a new tab. I have no problem in opening a new tab, when i set Firefox's default behavior to handle PDF as 'view in Firefox' or 'View in adobe reader'. The problem arises when i set it to 'Save link' and assign a folder to download files.

      What happens in this case is that the runs the loop first time and loads the page and firefox downloads the file, and the programs just pauses there and never proceeds forward. If i add a print "reached"; statement after $mech->get() function, i do not see any output.

      I want to understand why this happens

        Are you running this behind a proxy? Chances are, your firefox got stuck when trying to resolve the url you're downloading, with that proxy and failed to authenticate your credentials. Why not use WWW::Mechanize? I'm pretty sure it has some options for proxy authentication.