Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Rename/mkdir with File::Fetch

by justin423 (Scribe)
on Aug 10, 2022 at 15:03 UTC ( [id://11146079]=perlquestion: print w/replies, xml ) Need Help??

justin423 has asked for the wisdom of the Perl Monks concerning the following question:

I posted a question on here before about File:Fetch and got a bunch of great responses, so thank you all... It doesn't seem like this is possible, but can file:Fetch either rename the files to a particular filename, or alternatively create a new directory each time with a pre-determined folder name? The URL's are in format www.example.com/document_id/document.pdf where document id is a unique number provided by the publisher. So all the files to fetch are named document.pdf. So that each successive document doesn't overwrite the previous one, I rename them to document0.pdf, document1.pdf using a loop to keep them unique. (see code below) so is there a way to either change the filename to document_id.pdf or make a new directory of data/documents/document_id/ and save the document.pdf to that new folder? I think file:fetch only takes one variable input. and won't work with SELECT DOCUMENT_ID,URL FROM LINKS
my $query = "select url FROM LINKS"; # << minor edit my $sth = $dbh->prepare($query) or die "prepare: ".$dbh->errstr; $sth-> execute() or die "execute: ".$dbh->errstr; $i=0; while (my $ref = $sth->fetchrow_hashref()) { print "\nurl: $ref->{url}\n"; my $ff = File::Fetch->new(uri=>$ref->{url}); my $where = $ff->fetch( to => '/data/documents/'); my $error= $ff->error(); rename ("C:/data/documents/document.pdf","C:/data/documents/document$i.pdf"); ($i++); }

Replies are listed 'Best First'.
Re: Rename/mkdir with File::Fetch
by hippo (Bishop) on Aug 10, 2022 at 16:06 UTC

    Perhaps given the previous thread and this one, File::Fetch is not the most appropriate module for your particular task. There are many other modules on CPAN to help with downloading of files. You probably already have LWP installed so you could use LWP::Simple::getstore which accepts both a URL to fetch and a local filename to store it in.


    🦛

      thanks for the suggestion. Let me try that. The mkdir works as well, so I can try out both ways and see which one I like better.
Re: Rename/mkdir with File::Fetch
by Corion (Patriarch) on Aug 10, 2022 at 15:26 UTC

    The documentation of File::Fetch suggests no way to specify the name of the output file. Since you already have the ID, I suggest creating a directory for each ID and saving the file there:

    my $id = $ref->{document_id}; my $dir = "/data/documents/$id"; mkdir $dir or die $!; my $ff = File::Fetch->new(uri => $ref->{url}); $ff->fetch( to => $dir );
      I got this to work.
      my $tmp_dir='/data/documents/'; while (my $ref = $query->fetchrow_hashref()) { print "url: $ref->{url}\n"; my $id = $ref->{document_id}; my $dir = "/data/documents/"; # mkdir $dir or die $!; my $ff = File::Fetch->new(uri => $ref->{url}); $ff->fetch( to => $dir ); rename("$tmp_dir/document.pdf", "$tmp_dir$id.pdf") || die ( "Error in + renaming" );
Re: Rename/mkdir with File::Fetch
by Anonymous Monk on Aug 10, 2022 at 20:37 UTC
    my $ff = File::Fetch->new(uri=>$ref->{url}); $ff->file("document$i.pdf"); my $where = $ff->fetch ...

      $ff->file() is an unfortunate name, it is documented as "The name of the remote file". At least for HTTP, a document available via a URI might come from a database, or it might just be generated on the fly, so there might be no file at all.

      Plus, calling $ff->file(...) before calling $ff->fetch(...) will change the source URI for at least some URI schemes and fetch methods, which is NOT the desired behaviour.

      But: The documentation for $ff->file() has a second sentence, and that's the real hint:

      For the local file name, the result of $ff->output_file will be used.

      It might look like you could simply call $ff->output_file("document$i.pdf"), but a quick look into the source code shows that more work is needed:

      sub output_file { my $self = shift; my $file = $self->file; $file =~ s/\?.*$//g; $file ||= $self->file_default; return $file; }

      So, you need to replace output_file, either by monkeypatching or by inheriting from File::Fetch and replacing that method with one that returns the desired file name.

      Or, if all you need is to fetch from HTTP or HTTPS, follow hippo's++ advice and use LWP::Simple.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        ?? Did you try it?? I'm on mobile

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11146079]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-03-29 01:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found