Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^14: Need help with WWW::Mechanize and Chrome cookies

by Corion (Patriarch)
on Jul 11, 2021 at 19:07 UTC ( [id://11134921]=note: print w/replies, xml ) Need Help??


in reply to Re^13: Need help with WWW::Mechanize and Chrome cookies
in thread Need help with WWW::Mechanize and Chrome cookies

Why do you expect the following to work for WWW::Mechanize::Chrome? This is not a documented call:

$mech->get($foo, ':content_file'=>$filename);

I wonder why you say that the following "return anything":

my $file_content = $mech->get($foo);

->get() is documented to return a response, so I suggest that you print it, or inspect it using Data::Dumper.

Replies are listed 'Best First'.
Re^15: Need help with WWW::Mechanize and Chrome cookies
by bakiperl (Beadle) on Jul 11, 2021 at 20:05 UTC
    Corion,
    I get an empty string when I print the file content.
    my $file_content = $mech->get($foo); print $file_content;
    Here is the result returned by the Data::Dumper
    $VAR1 = bless( { '_headers' => bless( { '::std_case' => { 'x-frame-opt +ions' => 'X-Frame-Options', 'expect-ct' +=> 'Expect-CT', 'content-sec +urity-policy' => 'Content-Security-Policy', 'x-xss-prote +ction' => 'X-XSS-Protection', 'x-content-t +ype-options' => 'X-Content-Type-Options', 'strict-tran +sport-security' => 'Strict-Transport-Security', 'referrer-po +licy' => 'Referrer-Policy' }, 'content-security-policy' => ' +default-src \'self\' data: https: \'unsafe-eval\' \'unsafe-inline\'', 'date' => 'Sun, 11 Jul 2021 19 +:46:11 GMT', 'strict-transport-security' => + 'max-age=31536000; includeSubDomains', 'etag' => '"06c81313776d71:0"' +, 'expect-ct' => 'enforce, max-a +ge=30, report-uri="https://{$subdomain}.report-uri.com/r/d/ct/enforce +"', 'x-frame-options' => 'SAMEORIG +IN', 'server' => '', 'x-content-type-options' => 'n +osniff', 'x-xss-protection' => '1;mode= +block', 'accept-ranges' => 'bytes', 'referrer-policy' => 'no-refer +rer' }, 'HTTP::Headers' ), '_request' => undef, '_content' => '', '_rc' => 304, '_msg' => 'Not Modified' }, 'HTTP::Response' );
      Corion,
      It looks like the issue is related to the file type. If the .csv file is replaced with .html file, the ->get() returns the content of the file.
Re^15: Need help with WWW::Mechanize and Chrome cookies
by bakiperl (Beadle) on Jul 17, 2021 at 19:54 UTC
    Corion says: Did you set the download directory (download_directory option) in the constructor?
    -----------

    The download_directory finally worked after finding out that chrome does not support paths in this format c:/path...
    The download path has to use the backslash instead ( c:\path... ).
    my $downloads = = "C:\\path\\"; $mech->set_download_directory( $downloads); $mech->get($foo);
    My final question is how to stop chrome browser from loading some documents so that the download can be executed with WMC. If a document (such as a jpg) is loaded in the browser, the file does not download.
    Thank you for your patience.

      If a file does not download, have you tried inspecting the HTTP::Response object you receive from the ->get() call?

      my $response = $mech->get($url); open my $output, '>:raw', '/tmp/output.jpg'; print { $output } $response->decoded_content;

      Edit: You might need to touch the ->content of the browser first so everything has time to initialize first:

      my $response = $mech->get($url); my $c = $mech->content; # Dummy request to initialize everything open my $output, '>:raw', '/tmp/output.jpg'; print { $output } $response->decoded_content;
        Corion,
        This code did the trick for image files but not for PDFs. I also noticed that it works only for simple urls in the page. It did not work for me when the images are displayed using JavaScript.
        I was hoping to find a way to block the browser from displaying the documents by using something like the Content-Disposition option.
        Content-Disposition: attachment; filename=$filename
        This has worked very well with WWW::Mechanize.
        Thank you.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11134921]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2024-03-28 12:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found