Polyglot has asked for the wisdom of the Perl Monks concerning the following question:

I use the following code as part of a script to download URLs from a remote server. I have root access (ssh) to the remote webserver, and am the webmaster for the files I am downloading. I am not connecting to Google, so far as I am aware, anywhere in this process--I don't even have any Google ads or tracking set up on my site.

Mysteriously, Google is apparently feeding cookies to my cookie jar which was created (I think) by HTTP::Cookies and seemingly fed by LWP. Where might these be coming from?

Relevant code:

#!/usr/bin/perl #WGET-STYLE DOWNLOADER VIA LWP::UserAgent; use strict; use warnings; use LWP::UserAgent; use HTTP::Cookies; use Encode qw(encode decode); my $user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebK +it/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'; my $http_header = 'Accept: application/xml,application/xhtml+xml,text/ +html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5; Accept-Charset: UTF- +8, iso-8859-1; Accept-Language: en-US,en;q=0.5; Accept-Encoding: gzip +,deflate; Connection: keep-alive'; # [snip] skip to pertinent subroutine . . . sub fetch { if ($DEBUG) { print "SUB fetch: $_ \n" }; my $url = shift @_; my $agency = $user_agent || 'Mozilla/5.0 (Windows NT 6.1)'; my ($method, $uri, $data) = @ARGV; $method = 'GET' unless $method; $uri = $url unless $uri; $data = '' unless $data; my $browser = new LWP::UserAgent(); $browser->cookie_jar( {} ); $browser->requests_redirectable(['GET', 'HEAD', 'POST']); $browser->max_redirect(5); $browser->agent($agency); $browser->get($url); my $request = HTTP::Request->new(); $request->method($method); $request->uri($uri); if (uc($method) eq 'POST') { if ($http_header) { $request->header($http_header); } else { $request->header('Content-Type' => 'application/x-ww +w-form-urlencoded'); } $request->content($data); } my $jar = HTTP::Cookies->new(); $jar->load('./HTTP-Cookies.jar'); #LOCAL FILE, WRITABLE $jar->add_cookie_header($request); my $response = $browser->request($request); $request = $response->request(); $jar->extract_cookies($response); $jar->save('HTTP-Cookies.jar'); $response = decode("utf8", $response->as_string()); unless ($keep_response_header) { #TO REMOVE THE HTTP HEADERS (NOTE: _NOT_ THE HTML HEADERS) #~~~I COULDN'T THINK OF A WORKABLE ONE-LINER FOR THIS~~~ #--------- my @lump = split(/\n/, $response); my $hdr = 1; my $line = ''; while (($hdr) && (@lump)) { $line = shift @lump; if ($line =~ m/^\s*$/) {$hdr=0}; } $response = join "\n", @lump; #--------- } return $response; } #END SUB fetch

Cookie jar contents after script execution

#LWP-Cookies-1.0 Set-Cookie3: 1P_JAR=2022-04-24-16; path="/"; domain=.google.com; path_ +spec; secure; expires="2022-05-24 16:49:33Z"; version=0 Set-Cookie3: AEC=AakniGM0siZKGKmyFVveOarPvbRyMhhgILvlobJdmPIlHSZDzBcH9 +ydhdZs; path="/"; domain=.google.com; path_spec; secure; expires="202 +2-10-21 16:49:33Z"; HttpOnly; SameSite=lax; version=0 Set-Cookie3: NID="511=b4k5SZAGi5bJDr41ZOmk-PAN1cFp0SiGD39_9e4AyeoCoHqy +cr9_QS13X_oMwyA055BRm46An2txQ9XYUI0QZK8zU2j5NP_BGmVBHyrDggE_NzYqDVzk5 +NU1Q2PzPEvenKLIVkPXQVbJTM664h7byByPmnioIKx3vvYpjxr_314"; path="/"; do +main=.google.com; path_spec; expires="2022-10-24 16:49:33Z"; HttpOnly +; version=0

Blessings,

~Polyglot~

  • Comment on Why does HTTP::Cookies and/or LWP add Google cookies to my cookie jar when visiting another site?
  • Select or Download Code

Replies are listed 'Best First'.
Re: Why does HTTP::Cookies and/or LWP add Google cookies to my cookie jar when visiting another site?
by Corion (Patriarch) on Apr 27, 2022 at 11:10 UTC

    There are two easy ways where Google cookies could come into the mix. One is, that your cookie file is already filled with them. Then the easy solution would be to delete these lines.

    The second thing is that maybe one of the URLs you fetch redirects to Google, for whatever reason. One approach to see the requests that LWP makes would be to use LWP::ConsoleLogger

    use LWP::ConsoleLogger::Easy qw( debug_ua ); debug_ua($browser);

    ... and then watch as the requests scroll by.

      Thank you for the clues. Upon investigation, emptying the cookie jar and retrying, the script now generates an error message which says:

      ./HTTP-Cookies.jar does not seem to contain cookies at /System/Library/Perl/Extras/5.12/HTTP/Cookies.pm line 432.

      ...and the file remains empty, except for this:

      #LWP-Cookies-1.0

      It was about this point that I remembered that I had pointed the script once, for testing purposes, at another website--one that I do not own. Evidently, the cookies in the jar remained permanently after that, which was not the behavior I had expected.

      I was unable to install the module you recommended, unfortunately. I got only error messages, and it failed to install. Disappointing.

      Now, why is it that the cookie jar seems necessary in order to complete the GET request? ... but maybe I'm just misunderstanding the process again.

      Blessings,

      ~Polyglot~

        Evidently, the cookies in the jar remained permanently after that, which was not the behavior I had expected.

        Persistently storing the cookies is the only reason to have a cookie jar present on the filesystem.


        🦛

Re: Why does HTTP::Cookies and/or LWP add Google cookies to my cookie jar when visiting another site?
by Bod (Parson) on Apr 27, 2022 at 12:02 UTC
      Wow. I had no idea Google even published such things as what cookies their services deliver. Thank you for educating me.

      Blessings,

      ~Polyglot~