Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Re^6: Need help with WWW::Mechanize and Chrome cookies

by marto (Cardinal)
on Jul 09, 2021 at 16:59 UTC ( #11134854=note: print w/replies, xml ) Need Help??

in reply to Re^5: Need help with WWW::Mechanize and Chrome cookies
in thread Need help with WWW::Mechanize and Chrome cookies

Either find the links, get them, save them, or inject something like this and call it from the page for each target you've identified, or submit a patch to add the required functionality to this module, or choose something else to achieve your goal. Unless you need JavaScript there should be alternatives, but your post lacks enough detail to expand on that.

  • Comment on Re^6: Need help with WWW::Mechanize and Chrome cookies

Replies are listed 'Best First'.
Re^7: Need help with WWW::Mechanize and Chrome cookies
by bakiperl (Beadle) on Jul 09, 2021 at 22:11 UTC
    The links for these files
    <a class="txt" href="file.txt"> Text File </a>
    can be obtained using the WMC instance by doing something like this
    my @links = $mech->find_all_links( text_contains => 'some description +etc... ' ); my @urls = map { $_->[0] } @links;
    In the case of WWW::Mechanize (WM) you can simply download the files using this code
    for my $foo (@urls) { my $filename = '/path/'.$foo; $mech->get($foo, ':content_file'=>$filename); }
    Unfortunately, this function does not work with WWW::Mechanize::Chrome (WMC). I hope the Author of WMC can shed some light on this or provide a patch. Thank you.

      By "does not work", what do you mean exactly?

      If by that, you mean, "it's not documented, and not implemented", maybe you want to help implement it?

      Alternatively, you can maybe use

      ... my $filename = '/path/'.$foo; $mech->get($foo); my $img = $mech->content(); # save the image to disk
        Here is the issue:
        First, let's start with the html file that I have used to test the script (WMC.html)
        <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type" /> <title>Testing hyperlink file Downloads</title> </head> <body> <h2>Testing Download of hyperlinked files using WWW::Mechanize::Chrome +</h2> <p></p> Let's try downloading this <a href="/my_Files/csv_File.csv">CSV File</ +a> <br/><br/> </body> </html>
        Now here is the Perl script.
        #!/usr/bin/perl -w use Log::Log4perl qw(:easy); use WWW::Mechanize; use WWW::Mechanize::Chrome; use strict; my $cookie_dir = 'C:/Users/some_user/AppData/Local/Google/Chrome/User +Data/Default/'; #chrome cookies path #my $mech = WWW::Mechanize::Chrome->new( data_directory => $cookie_dir +); my $mech = WWW::Mechanize->new(); my $uri = URI->new( "" ); $mech->get( $uri ); unless ($mech->success) { my $mesg = $mech->response->status_line; print $mesg; goto FINISH; } my $path = "/path"; my @links = $mech->find_all_links( url_regex => qr/\.csv/i ); my @urls = map { $_->[0] } @links; for my $foo (@urls) { my $filename = $path.$foo; $mech->get($foo, ':content_file'=>$filename); my $file_content = $mech->get($foo); print $file_content->content(); } print "Success\n"; FINISH :
        When I use the WWW::Mechanize instance, the script runs fine. It prints and saves the file content to disk.
        However, when the WWW::Mechanize::Chrome instance is used I get the following error message:
        Cannot navigate to invalid URL -32000 at C:/Perl/perl/site/lib/Chrome/DevToolsProtocol/ line + 490
        The hyperlinked files don't download to disk unless they are going somewhere else other than the declared directory. The code that you have suggested returns the the html document.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11134854]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2022-05-25 17:19 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (90 votes). Check out past polls.