rastoboy has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I'm pondering a project which would involve fetching random web pages and displaying them via CGI script. A most vexing puzzle for me has been how to do this elegantly.

For example, if I fetch the page via WWW::Mechanize, I need to look for all img tags, href's, and anything else using a relative path and prepend the base URL of the site to it, before passing it on to the browser, if I want it to look "right".

But this gets real complicated real fast when dealing with javascript, css, and other assorted unexpected references--not to mention sites that gaurd against others using their images. I've even contemplated taking a screenshot of the page, but even then the perl modules I found depend on X windows based tools like Mozilla or Webkit, and I really don't want to have X Windows on my Linux server at all (and also my project involves making relevant changes such as highlights or underlines on the fly).

Is there an elegant way to fetch a page, and re-display it reliably?

Replies are listed 'Best First'.
Re: Re-rendering web pages
by gmargo (Hermit) on Nov 28, 2009 at 16:52 UTC
      proxies, cool--thanks, y'all!

      I'll post a link when I'm done. I'm not trying to steal anybody's work at all :-P

        If you still consider screenshots (maybe for a thumbnail preview of whatever you display), there are plenty of (partially free or small money) web services available.


        holli

        You can lead your users to water, but alas, you cannot drown them.
Re: Re-rendering web pages
by WizardOfUz (Friar) on Nov 28, 2009 at 12:16 UTC

    Have you considered using frames?

    Update: Sorry, I missed the "making relevant changes" part of your question. Frames alone will probably not solve your problem.

      Frames rarely solve problems, but most of the times, they create a huge load of new problems, starting with breaking bookmarks, search engines, and page designs. The same applies to iframes. Don't use frames and iframes for new web pages.

      The OP wants a showcase effect. I can't imagine any good use for that. Javascript code to break out of foreign frames had become common just because people don't want to see their work in a showcase of some bad guy that pretends it's his work.

      For a web designer wanting to show his work, standard hyperlinks to his client's web sites should be sufficient. Adding a target attribute to open a new window or tab may be useful, but is completely optional. A simple screenshot is often sufficient, too.

      For web sites that want to share content on the base of some kind of contract, parsing and rewriting HTML and JS should not be needed. The source web server simply offers an interface from which the content (and required meta-data) can be fetched in a form that is easily useable for the showcase web server. JSON and XML could be used as data formats, FTP and HTTP could be used to transport the content and meta-data. Passwords, SSL certificates, and IP address checks could be used to limit the availability of the data and meta-data. Transforming the content to (X)HTML is left to the destination web server.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Re-rendering web pages
by Anonymous Monk on Nov 28, 2009 at 13:05 UTC