tomazos has asked for the wisdom of the Perl Monks concerning the following question:

So what I'd like to do is within Apache/mod_perl (specifically a Mason component) is download a web page from a third party web server with some kind of HTTP client library (LWP?) and then add, filter or otherwise munge the HTML and then re-serve it as the output of the PerlHander (Mason component).

I guess it is a sort of HTTP server/client proxy thing.

My question is, (A) is there an existing module that is purpose built for embedding an HTTP client inside of mod_perl?, or, if not (B) what would be the best approach to do this? Just load LWP? Is there a lighterweight solution, giving the premium of memory space in an Apache child process?

Any ideas appreciated. -Andrew.


Andrew Tomazos  |  andrew@tomazos.com  |  www.tomazos.com

Replies are listed 'Best First'.
Re: Embedding LWP inside mod_perl (+Mason)
by Fletch (Bishop) on May 30, 2005 at 17:41 UTC

    You probably just want LWP and one of the HTML munging modules (probably HTML::TreeBuilder). You can reduce your resource usage somewhat by:

    • remembering to free up after you're done with a parsed tree ($tree->delete, to break any reference cycles)
    • take advantage of Mason's caching if possible (which will use more local disk and/or RAM depending on what Cache::Cache setup you're configured to use, but will require less processing and network traffic if you can avoid another LWP fetch and parse)

    If you're really worried, you might write a separate custom proxy using HTTP::Proxy or POE and then use apache's mod_proxy to make those results appear under your server's URI namespace.

    --
    We're looking for people in ATL