Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

hi monks

I am having problems fetching files from some sort of URIs. I used LWP::UserAgent (same problem) but now I changed to File::Fetch as it seems to me quite suitable for my task. I came out with this:

#!/usr/bin/perl use strict; use warnings; use File::Fetch; my $url=" http://www.ekey.net/downloads-475?download=2132cbe2-2fb1-eef +f-583c-50a39b6aba6c&name=v2_ITA_12-Seiter_Programm_1207_web.pdf"; print "Downloading $url\n"; my $ff = File::Fetch->new(uri => $url); my $where = $ff->fetch( to => "ekey corpus" ) or $ff->error;

The file definitivly exists in that location. Compy and paste in Firefox and you get promped to download the file (Firefox knows the file names starts after the last "=").

Any idea why this does no work?

Replies are listed 'Best First'.
Re: Fetch Problem uri
by Perlbotics (Archbishop) on Jul 04, 2015 at 09:51 UTC

    Works for me when I remove the leading whitespace from $url ;-)

    Update (2nd question - see below): Change destination filename.

    Seems, that the module did not provide a setter-method to change the output_file()? I'll notify the author later...

    This monkeypatch should correct that:

    #!/usr/bin/perl use strict; use warnings; use File::Fetch; my $url="http://www.ekey.net/downloads-475?download=2132cbe2-2fb1-eeff +-583c-50a39b6aba6c&name=v2_ITA_12-Seiter_Programm_1207_web$ print "Downloading $url\n"; my $ff = File::Fetch->new(uri => $url); { #-- ugly patch package File::Fetch; print "PATCHED: ", __PACKAGE__, " version $VERSION!\n"; #-- patche +d line#0 sub output_file { my $self = shift; my $file = $self->file; $self->{_out} = $_[0] if $_[0]; #-- patche +d line#1 return $self->{_out} if $self->{_out}; #-- patche +d line#2 $file =~ s/\?.*$//g; $file ||= $self->file_default; return $file; } } package main; print "Before: ", $ff->output_file,"\n"; $ff->output_file("i-probably-violate-terms-of-use.pdf"); #-- or extra +ct name from URI using a regex print "After : ", $ff->output_file,"\n"; my $where = $ff->fetch( to => "ekey corpus" ) or $ff->error;
    Result:
    ... PATCHED: File::Fetch version 0.48! Before: downloads-475 After : i-probably-violate-terms-of-use.pdf ...
    Fixing this problem by use of parent is left as an exercise to the AM...

      Oh, what an idiot! Thank you.

      There is anyway I smaller problem. The filename is wrongly parsed so that I get a file called downloads-475 (without .pdf). Any idea why? Or do I just have to try to parse it correctly by myself?

Re: Fetch Problem uri
by 1nickt (Canon) on Jul 04, 2015 at 10:08 UTC

    Regarding your second problem, the docs for File::Fetch say:

    $ff->output_file The name of the output file. This is the same as $ff->file, but any qu +ery parameters are stripped off. For example: http://example.com/index.html?x=y would make the output file be index.html rather than index.html?x=y.

    However, output_file() is an accessor only, so you can't change the value.

    UPDATE: Better explained and shown with a patch in the reply above.

    You probably would like to do something like:

    my $ff = File::Fetch->new(uri => $url); my $output_name = $ff->name; $ff->file =~ /name=(.*)$/ and $output_name = $1; # $output_name is now 'v2_ITA_12-Seiter_Programm_1207_web.pdf' $ff->output_file( $output_name );

    ... but that doesn't work.

    Update: The below errors were caused by the missing space in the filename.

    That's by (poor) design, but the module seems to have other problems, as the accessor methods don't seem to do what they say:

    my $ff = File::Fetch->new(uri => $url); say "scheme: " . $ff->scheme; say "host: " . $ff->host; say "path: " . $ff->path; say "file: " . $ff->file; say "output_file: " . $ff->output_file; ## outputs: Use of uninitialized value in concatenation (.) or string at ./foo.pl +line 12. scheme: host: http: path: //www.ekey.net/ file: downloads-475?download=2132cbe2-2fb1-eeff-583c-50a39b6aba +6c&name=v2_ITA_12-Seiter_Programm_1207_web.pdf output_file: downloads-475
    Remember: Ne dederis in spiritu molere illegitimi!

      Nice patch!

      Now I just have to figure out how to get the right file name out of the URI. Not so easy as it seems as the URI contains query parameters. I tried without success:

      my $filename = (URI->new($url)->path_segments)[-1]; my ($volume,$directories,$filename) = File::Spec->splitpath( $url );

      Strange that there is no available module that seems to cope rightly ith this URI. Or maybe is the URI to be "non standard

        (Scroll down for an answer to your latest question ...)

        Strange that there is no available module that seems to cope rightly ith this URI. Or maybe is the URI to be "non standard

        Your URI is fine (until you added a space at the start, heh). The module is maybe what is "non-standard," I am afraid.

        First the problem addressed by Perlbotics' patch; the method $ff->output_file not being a method to set the value, as it would appear to be.

        Then the ungraceful handling of a problem URI (e.g. with a leading space as in your OP):

        my $url = ' http://www.perlmonks.com/foo?bar=baz'; print "Downloading >$url<\n"; # note use of delimiters to make a stray + # leading space more visible in your deb +ug my $ff = File::Fetch->new(uri => $url); say "scheme: " . $ff->scheme; say "host: " . $ff->host; say "path: " . $ff->path; say "file: " . $ff->file; say "output_file: " . $ff->output_file;
        ## outputs: Use of uninitialized value in concatenation (.) or string at ./foo.pl +line 10. scheme: # <- error host: http: # <- error path: //www.perlmonks.com/ # <- error file: foo?bar=baz output_file: foo

        These two things alone would make me consider looking for a different solution on CPAN.

        Now I just have to figure out how to get the right file name out of the URI.

        You are on the right track with a path-parsing module. But if all your files are of the format you showed, you might want to use a regexp:

        #!/usr/bin/env perl -w use strict; my $url = 'http://www.ekey.net/downloads-475?download=2132cbe2-2fb1-ee +ff-583c-50a39b6aba6c&name=v2_ITA_12-Seiter_Programm_1207_web.pdf'; (my $output_name = $url) =~ s/^.*name=(.*)$/$1/; print "$output_name\n"; __END__
        ## outputs: v2_ITA_12-Seiter_Programm_1207_web.pdf
        Remember: Ne dederis in spiritu molere illegitimi!