grabbing html source

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have some dynamically generated web pages which I'd like to give the user a chance to email (an *email this page to someone* button). Given that regenerating the html content for the email would be silly, I am trying to grab the source of the page the user is seeing, which I will then parse as necessary and build the email out of. I've read through a bunch of modules (including LWP and HTML::Parse) and couldn't come up with anything. Any ideas? Thanks.

Comment on grabbing html source

Replies are listed 'Best First'.
Re: grabbing html source by dorward (Curate) on Sep 09, 2005 at 13:26 UTC
User requests page Server generates page Server sends page to user Browser renders page At this stage (although I'm simplifying a little), the page exists only in the browser's cache - and there is no way for you to retrieve that from the user's browser (short of playing around with JavaScript to post the entire page back to the server - but that would be even sillier then generating the data again (not to mention unreliable). You might want to consider doing with the vast majority of websites do and allow the user to just copy and past the URL from their address bar into their email client. Saying "Please trust us with your friend's email address" can leard to concerns that you are harvesting addresses to spam.	[reply]
Re^2: grabbing html source by Anonymous Monk on Sep 09, 2005 at 14:01 UTC
You might want to consider doing with the vast majority of websites do and allow the user to just copy and past the URL from their address bar into their email client Not feasible. The HTML in question is in an intranet, not the open net. I need to email the page the user sees.	[reply]
Re: grabbing html source by marto (Cardinal) on Sep 09, 2005 at 13:52 UTC
Hi, From the LWP documentation: # Create a user agent object use LWP::UserAgent; $ua = LWP::UserAgent->new; $ua->agent("MyApp/0.1 "); # Create a request my $req = HTTP::Request->new(POST => 'http://search.cpan.org/search' +); $req->content_type('application/x-www-form-urlencoded'); $req->content('query=libwww-perl&mode=dist'); # Pass request to the user agent and get a response back my $res = $ua->request($req); # Check the outcome of the response if ($res->is_success) { print $res->content; } else { print $res->status_line, "\n"; } [download] Based on the above code you could employ Mime::Lite to email the user. Update. Please keep in mind the issues mentioned dorwards post Hope this Helps, Martin	[reply] [d/l]
Re: grabbing html source by Arunbear (Prior) on Sep 09, 2005 at 15:27 UTC
Regenerating the content is not silly. Consider this very thread; Perlmonks can (re)generate it in different ways grabbing html source (normal) XML print Using a templating system e.g. Template Toolkit (see also merlyn's articles on Template Toolkit) can make it easy to create browser and email views of your content based on reusable components.	[reply]
Re: grabbing html source by scmason (Monk) on Sep 09, 2005 at 13:48 UTC
Dorward is aboslutely right. However, if the user is willing to give you their friends email address you could just generate an email that says something like: Hello Friend Dorward wants you to see this page at SomeSite.com. Visit htt://someurl.com/that/you/gen to view the page. Sincerely SomeName Like Dorward's solution, this is a win-win situation, because you also get the 'visit' to your page which helps with stats, "Never take yourself too seriously, because everyone knows that fat birds dont fly" -FLC	[reply]
Re: grabbing html source by b10m (Vicar) on Sep 09, 2005 at 15:01 UTC
I'd go for this solution also because of the following reasons: No need to download page from your own server Way more friendly for the recipient: Smaller email User might not have a client that can handle HTML And if the URLs are terribly long, you might even throw it through one of the WWW::Shorten modules. -- b10m All code is usually tested, but rarely trusted.	[reply]
Re: grabbing html source by cbrandtbuffalo (Deacon) on Sep 09, 2005 at 20:43 UTC
I agree that it's the best approach to re-generate the page since you have indicated you can't send a link. Unless each page puts some very heavy load on the server, regenerating it shouldn't be a problem. Some other issues to think about are how big the pages might be. Sending really large emails from your web server might end up creating more challenges than re-requesting a page. Also, if the recipients may not have access to the intranet content, you'll have to work around the fact that images and other included content won't be seen. Maybe you only need the text of the page, so it won't matter. But if things like images are important to the look of the page, you'll need to send the page and the images. That could make for a fairly large email.	[reply]

Back to Seekers of Perl Wisdom