in reply to Chaining proxies with LWP::UserAgent

There's nothing in the HTTP protocol that even permits this. If IE is letting you say that, it's not able to deliver on it. So, it doesn't matter how many times you rewrite the HTTP implementation... there's no point. There's no character sequence that does this.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

  • Comment on •Re: Chaining proxies with LWP::UserAgent

Replies are listed 'Best First'.
Re: •Re: Chaining proxies with LWP::UserAgent
by Anonymous Monk on Jun 28, 2003 at 08:54 UTC
    Thanks for the quick reply. I was mistaken about MSIE's capabilities, my apologies.

    However, HTTP does indeed permit this, via the CONNECT method as described in RFC 2616:

    This specification reserves the method name CONNECT for use with a proxy that can dynamically switch to being a tunnel (e.g. SSL tunneling 44).

    RFC 2817 details usage of the CONNECT method:

    5.2 Requesting a Tunnel with CONNECT

    A CONNECT method requests that a proxy establish a tunnel connection on its behalf. The Request-URI portion of the Request-Line is always an 'authority' as defined by URI Generic Syntax 2, which is to say the host name and port number destination of the requested connection separated by a colon:

    CONNECT server.example.com:80 HTTP/1.1
    Host: server.example.com:80

    To "chain" two proxies, one simply sends a CONNECT to the first proxy, requesting to connect to the second proxy. Once the tunnel is established, normal GET requests with a fully-qualified URI can be used. This works.

    I know HTTP, I just want to coax Perl into speaking this portion of the specification. Can I, using the standard modules? I searched LWP documentation for any mention of the CONNECT method, but couldn't find any occurances... are there any other HTTP modules which fully support the HTTP/1.1 spec, or do I have to write my own?

      As the documentation for LWP says, the best place to discuss exact details for complex stuff is libwww@perl.org. However browsing the documentation and code it looks like there is an assumption built into the code that all requests will be mediated through only one level of proxy, and which proxy to use will be a function of the communication scheme used. (In other words it assumes a configuration which matches any browser you have ever seen.)

      But if use the full interface, your LWP::UserAgent object has methods named send_request(), simple_request() and just request() which send off requests with various levels of preparation and munging first. But the important thing is that they can be anything that you want.

      So I would suggest playing with using no proxy, and then start sending your own CONNECT requests, and see whether you can get it to send the correct sequence to chain levels of proxying.

      PS I have only seen multiple levels of proxying used by people who were attempting to use various open proxies to anonymize themselves. Having looked at the traffic that passed through one such proxy, the users paid attention to how much the server reported to others, and didn't seem to realize that the server they are proxying off of can keep logs including the information not passed on, and do things like hand it over to law enforcement... (Yes, I do know of at least one case where law enforcement took full advantage of this.)

        But if use the full interface, your LWP::UserAgent object has methods named send_request(), simple_request() and just request() which send off requests with various levels of preparation and munging first. But the important thing is that they can be anything that you want.
        If only that were the case.

        The syntax of CONNECT is: CONNECT host:port HTTP/1.0.

        LWP overloads the URI field to contain both a) the host and port to connect to (to send the request to) and b) the URI to send in the request. I do HTTP::Request->new("CONNECT", "http://proxy1_host:proxy1_port/proxy2_host:proxy2_port", ...) and LWP sends CONNECT /proxy2_host:proxy2_port to proxy1_host:proxy1_port. So I hack LWP::Protocol::http to not send the slash.

        Now comes the tunnelling. I'm supposed to be able to communicate through the tunnel once the connection is established and the tunnel sends HTTP/1.0 200 Connection established. I thought this would be as simple as sending an HTTP request in the content field:

        $req = HTTP::Request->new("CONNECT", "http://proxy1_host:proxy1_port/p +roxy2_host:proxy2_port", HTTP::Headers->new(), "GET http://final_destination.example.com/ HTTP/1.0\cJ\cM\cJ\cM");

        Not so. The content comes before the response (which is expected, I guess):

        CONNECT proxy2_host:proxy2_port HTTP/1.1 TE: deflate,gzip;q=0.3 Connection: TE, close Host: proxy2_host:proxy2_port User-Agent: libwww-perl/5.68 Content-Length: 39 GET http://final_destination.example.com/ HTTP/1.0 HTTP/1.0 200 Connection established
        So I can't get here from there, through LWP, as far as I can see.

        HTTP::Lite suffers from the same disease.

        request ( $url, $data_callback, $cbargs )
        Initiates a request to the specified URL.

        The host to connect to and the request to send are stuffed inside the $url parameter. I don't want to use URIs more than necessary, I just want to speak HTTP.

        HTTP::MHTTP also crams the host to connect to and the request into a URI, passed to http_call.

        Net::HTTPTunnel can do HTTP tunnelling through arbitrary TCP services. The only drawback is that I'll have to write my own HTTP, unless somehow one of these HTTP modules can be instructed to communicate on a given socket created by Net::HTTPTunnel. That's a reasonable trade-off, I suppose.

        PS I have only seen multiple levels of proxying used by people who were attempting to use various open proxies to anonymize themselves.
        RFC 2817 seems to disagree:
        It may be the case that the proxy itself can only reach the requested origin server through another proxy. In this case, the first proxy SHOULD make a CONNECT request of that next proxy, requesting a tunnel to the authority.
        I try to code for all relevant cases, and since I'm writing a program mainly for proxies this is relevant.

        Its definitely possible that one will use an open proxy, but my program will also accept internal proxies; no distinction is made. Perhaps I could do a DNSRBL lookup, although such a check would slow down execution.

        Having looked at the traffic that passed through one such proxy, the users paid attention to how much the server reported to others, and didn't seem to realize that the server they are proxying off of can keep logs including the information not passed on, and do things like hand it over to law enforcement... (Yes, I do know of at least one case where law enforcement took full advantage of this.)
        Thanks for heads up -- I plan to take full advantage of this, also; but that's another thread.