Marcello has asked for the wisdom of the Perl Monks concerning the following question:

Hi, for a new product I need to fetch external sites, so the user does not known where it comes from e.g.:
http://www.myserver.com/page.htm

is translated to

http://www.anotherserver.com/somedir/page.htm

The webserver we use is Apache, and I know there is a mod_proxy which can do this for me. However, we are unable to install this module, so we need to look for another way, perhaps Perl?

Is there a module in Perl which implements a proxy? I could write one myself, but that would take my weeks to complete. It has to support GET/POST etc. etc., for the external website can use forms.

Any ideas? TIA

Replies are listed 'Best First'.
Re: Perl proxy
by mattriff (Chaplain) on Feb 20, 2002 at 14:15 UTC
    search.cpan.org is an excellent place to look for things such as this.

    I found Apache::ReverseProxy there, which seems like it might do what you need.

    It requires mod_perl though -- since you mention you can't install mod_proxy I don't know if installing mod_perl is an option for you.

    - Matt Riffle

•Re: Perl proxy
by merlyn (Sage) on Feb 21, 2002 at 00:14 UTC
    I think you're confusing "repackaging content" (as you seem to want) with "providing HTTP proxy services" (as mod_proxy does it). The latter requires a change to the behavior of the client, which in knowing that it wants site A, still asks site B to provide it.

    The "repackaging content" strategy is a difficult problem, because you have to rewrite all the URLs of the passed-through content, in whatever form they appear. Otherwise, the browser will end up fetching some stuff directly, possibly confusing everything. For example, URLs in A-HREF elements obviously need rewriting, but did you also consider the Location header for redirects, or cookie domains, or image maps, or even the URLs constructed by Javascript or Java?

    It's a difficult problem. I hope you gain enough to recoup the investment in figuring out how to do it. I hope you're also considering the ethical, moral, and legal issues of branding someone else's content as your own.

    For a simple start, handling only the A-HREF and Location rewrites, see my column on a poor-man's CGI "proxy".

    -- Randal L. Schwartz, Perl hacker

Re: Perl proxy
by beebware (Pilgrim) on Feb 21, 2002 at 09:40 UTC
    The Personal Open Directory script does a very similar job - it 'reads' the pages from dmoz.org, re-writes the URLs, re-brands it as necessary and allows sites such as my own site to offer the content without having to store several hundred megabytes of data. The code, while a little spaghettified, could be used as a good example of what you want to achieve.