klaymen has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I have a kind of webcrawler that's using LWP::UserAgent to download large numbers of webpages, basically using
my $ua = LWP::UserAgent->new(ssl_opts => { SSL_verify_mode => 'SSL_VER +IFY_NONE'},); $ua->timeout(45); ... my $request = HTTP::Request->new(GET => $trg); $request->protocol('HTTP/1.0'); $request->header('Accept' => '*/*'); $request->header('Connection' => 'Close'); ... my $resp = $ua->request($request); ...
Results are written into a database. Now my question is if there's a way to find out what local port was used for the connection. I don't need to set the port, but would like to know afterwards what port was chosen.

Background: I need some way to link connections the crawler makes in a unique way with data that a sniffer collects at the same time (which implements suricata rules and a passive SSL collector). As the crawler makes about 20 queries per second, the timestamp alone is not sufficient. As SSL connections are also used, I can't use the URL (because the sniffer doesn't see them). The destination IP would be an alternative, but might not be unique (the crawler migth access the same IP one after another for different URLs). So the local port number would be a good option, together with the timestamp of course. But for this to work, I must be able to figure out the port number the crawler uses (plus prevent shared connections, hence the Connection: close header). I could also try to set the local port in advance (I did read a posting that shows a way to do it), but that can cause errors (double used ports).

One way would be if I could somehow access the socket that LWP used/will use. Any suggestions are highly welcome :-)

Thanks, Andy

Replies are listed 'Best First'.
Re: LWP: How to find out local port number? (ISA)
by tye (Sage) on Oct 13, 2014 at 17:04 UTC

    You can tell LWP::Protocol that you want My::Https to be the class that implements the https protocol then write a My::Https class that mostly just inherits from LWP::Protocol::https but also provides an overridden _new_socket() method that calls SUPER::_new_socket(), records the local port number of the returned socket, then returns the socket.

    - tye        

      Thanks, that sounds promising... unfortunately I must admit I'm fighting a bit with it (I don't often use the OO interface I must admit). Can you tell me what's wrong with this (just a skeleton, without added funcitonality yet, and only for http)?
      use strict; use LWP::UserAgent; package MyHttp; use vars qw(@ISA); require LWP::Protocol::http; @ISA = qw( LWP::Protocol::http ); sub _new_socket { my($self, $host, $port, $timeout) = @_; my $s; print "Creating New socket: $host, $port, $timeout\n"; $s = $self->SUPER::_new_socket($host,$port,$timeout); print "ok\n"; return $s; } package main; LWP::Protocol::implementor( http => 'MyHttp' ); my $ua = LWP::UserAgent->new(ssl_opts => { SSL_verify_mode => 'SSL_VER +IFY_NONE'},); $ua->agent('Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Triden +t/5.0)'); $ua->max_size(50000000); # 50MB at most $ua->timeout(45); my $request = HTTP::Request->new(GET => "http://www.google.ch/"); $request->protocol('HTTP/1.0'); # we don't want chunked replies $request->header('Accept' => '*/*'); $request->header('Accept-Encoding' => ''); # we don't want packed re +sults $request->header('Connection' => 'Close'); $request->header('Cache-Control' => 'no-cache'); my $resp = $ua->request($request); if ($resp->is_success) { my $data = $resp->content; print "$data\n"; }
      it produces:
      $ perl porttest.pl Creating New socket: www.google.ch, 80, 45 $
      As the first test message is printed, _new_socket is called - but obviously calling teh SUPER::_new_socket somehow fails (unfortunately I don't get any error message). Actually even if I don't overwrite anything, it does not work. Maybe some kind of constructor must be added (isn't it taken from the base class by default?)?

        That's where I'd jump into the Perl debugger and figure out why it is silently failing.

        Inspecting the code, I found:

        sub socket_class { my $self = shift; (ref($self) || $self) . "::Socket"; }

        So you might try adding:

        package MyHttp; sub socket_class { "LWP::Protocol::http::Socket" }

        But I don't see why that problem would cause a silent failure.

        - tye        

        Try   $self->SUPER::_new_socket( @_ ); for starters :)