in reply to •Re: Chaining proxies with LWP::UserAgent
in thread Chaining proxies with LWP::UserAgent

Thanks for the quick reply. I was mistaken about MSIE's capabilities, my apologies.

However, HTTP does indeed permit this, via the CONNECT method as described in RFC 2616:

This specification reserves the method name CONNECT for use with a proxy that can dynamically switch to being a tunnel (e.g. SSL tunneling 44).

RFC 2817 details usage of the CONNECT method:

5.2 Requesting a Tunnel with CONNECT

A CONNECT method requests that a proxy establish a tunnel connection on its behalf. The Request-URI portion of the Request-Line is always an 'authority' as defined by URI Generic Syntax 2, which is to say the host name and port number destination of the requested connection separated by a colon:

CONNECT server.example.com:80 HTTP/1.1
Host: server.example.com:80

To "chain" two proxies, one simply sends a CONNECT to the first proxy, requesting to connect to the second proxy. Once the tunnel is established, normal GET requests with a fully-qualified URI can be used. This works.

I know HTTP, I just want to coax Perl into speaking this portion of the specification. Can I, using the standard modules? I searched LWP documentation for any mention of the CONNECT method, but couldn't find any occurances... are there any other HTTP modules which fully support the HTTP/1.1 spec, or do I have to write my own?

  • Comment on Re: •Re: Chaining proxies with LWP::UserAgent

Replies are listed 'Best First'.
Re: Re: •Re: Chaining proxies with LWP::UserAgent
by tilly (Archbishop) on Jun 28, 2003 at 17:15 UTC
    As the documentation for LWP says, the best place to discuss exact details for complex stuff is libwww@perl.org. However browsing the documentation and code it looks like there is an assumption built into the code that all requests will be mediated through only one level of proxy, and which proxy to use will be a function of the communication scheme used. (In other words it assumes a configuration which matches any browser you have ever seen.)

    But if use the full interface, your LWP::UserAgent object has methods named send_request(), simple_request() and just request() which send off requests with various levels of preparation and munging first. But the important thing is that they can be anything that you want.

    So I would suggest playing with using no proxy, and then start sending your own CONNECT requests, and see whether you can get it to send the correct sequence to chain levels of proxying.

    PS I have only seen multiple levels of proxying used by people who were attempting to use various open proxies to anonymize themselves. Having looked at the traffic that passed through one such proxy, the users paid attention to how much the server reported to others, and didn't seem to realize that the server they are proxying off of can keep logs including the information not passed on, and do things like hand it over to law enforcement... (Yes, I do know of at least one case where law enforcement took full advantage of this.)

      But if use the full interface, your LWP::UserAgent object has methods named send_request(), simple_request() and just request() which send off requests with various levels of preparation and munging first. But the important thing is that they can be anything that you want.
      If only that were the case.

      The syntax of CONNECT is: CONNECT host:port HTTP/1.0.

      LWP overloads the URI field to contain both a) the host and port to connect to (to send the request to) and b) the URI to send in the request. I do HTTP::Request->new("CONNECT", "http://proxy1_host:proxy1_port/proxy2_host:proxy2_port", ...) and LWP sends CONNECT /proxy2_host:proxy2_port to proxy1_host:proxy1_port. So I hack LWP::Protocol::http to not send the slash.

      Now comes the tunnelling. I'm supposed to be able to communicate through the tunnel once the connection is established and the tunnel sends HTTP/1.0 200 Connection established. I thought this would be as simple as sending an HTTP request in the content field:

      $req = HTTP::Request->new("CONNECT", "http://proxy1_host:proxy1_port/p +roxy2_host:proxy2_port", HTTP::Headers->new(), "GET http://final_destination.example.com/ HTTP/1.0\cJ\cM\cJ\cM");

      Not so. The content comes before the response (which is expected, I guess):

      CONNECT proxy2_host:proxy2_port HTTP/1.1 TE: deflate,gzip;q=0.3 Connection: TE, close Host: proxy2_host:proxy2_port User-Agent: libwww-perl/5.68 Content-Length: 39 GET http://final_destination.example.com/ HTTP/1.0 HTTP/1.0 200 Connection established
      So I can't get here from there, through LWP, as far as I can see.

      HTTP::Lite suffers from the same disease.

      request ( $url, $data_callback, $cbargs )
      Initiates a request to the specified URL.

      The host to connect to and the request to send are stuffed inside the $url parameter. I don't want to use URIs more than necessary, I just want to speak HTTP.

      HTTP::MHTTP also crams the host to connect to and the request into a URI, passed to http_call.

      Net::HTTPTunnel can do HTTP tunnelling through arbitrary TCP services. The only drawback is that I'll have to write my own HTTP, unless somehow one of these HTTP modules can be instructed to communicate on a given socket created by Net::HTTPTunnel. That's a reasonable trade-off, I suppose.

      PS I have only seen multiple levels of proxying used by people who were attempting to use various open proxies to anonymize themselves.
      RFC 2817 seems to disagree:
      It may be the case that the proxy itself can only reach the requested origin server through another proxy. In this case, the first proxy SHOULD make a CONNECT request of that next proxy, requesting a tunnel to the authority.
      I try to code for all relevant cases, and since I'm writing a program mainly for proxies this is relevant.

      Its definitely possible that one will use an open proxy, but my program will also accept internal proxies; no distinction is made. Perhaps I could do a DNSRBL lookup, although such a check would slow down execution.

      Having looked at the traffic that passed through one such proxy, the users paid attention to how much the server reported to others, and didn't seem to realize that the server they are proxying off of can keep logs including the information not passed on, and do things like hand it over to law enforcement... (Yes, I do know of at least one case where law enforcement took full advantage of this.)
      Thanks for heads up -- I plan to take full advantage of this, also; but that's another thread.
        Having looked at the LWP::UserAgent and LWP::Protocol::HTTP code, I think it should be doable to edit them to add support for the proxy either being an array ref, or a string of proxies chained with some convenient dividers (eg |). The idea is that whever you see a reference to a proxy, you just loop through the proxies and do what you do for all of them. Having scanned it, adding chained proxy support to HTTP looks doable (just a handful of lines in a couple of modules - grep for "proxy" and do "perldoc -l LWP::UserAgent" to find where it is on your system) and probably a lot easier than writing significant new code. It is distributed under the same terms as Perl, those are pretty generous so legal issues are likely not a problem for you.

        If don't think your Perl is up to it, ask on the appropriate list and you likely will find someone else who can. (The speed with which they add features for you might be affected by financial encouragement...)

        If you then contribute that back, and a lot of people will suddenly be able to easily use chained proxies in Perl if they need it. :-)

        PS My comment about chained proxies was not a statement that they aren't useful, just a comment about where I happened to have seen them before.