in reply to Removing selective tags and content between

Try HTML::TokeParser::Simple. It will handle most of your parsing needs.

use strict; use warnings; use HTML::TokeParser::Simple; my $page = do { local $/; <DATA> }; my $parser = HTML::TokeParser::Simple->new(\$page); my $html = ''; $parser->get_tag('body'); # skip to first body tag while (my $token = $parser->get_token) { last if $token->is_end_tag('body'); $html .= $token->as_is; } print $html; __END__ <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html> <head> <title>test</title> </head> <body> <h1>headline</h1> <p>Content</p> </body> </head>

Cheers,
Ovid

New address of my CGI Course.

Replies are listed 'Best First'.
Re: Re: Removing selective tags and content between
by diamich (Initiate) on Oct 15, 2003 at 14:33 UTC
    Thanks for the reply Ovid. Can I ask that you be a little more explicit? I'm TOTALLY Perl illiterate and since your solution doesn't use any of the variables I have, I can't even hazard a guess as to where I should put it or which variables I should change to match mine (or vice versa). Additionally, I should probably add that the two domains are on different hosts if that makes a difference.
      use HTML::TokeParser::Simple; $uatopasson = $ENV{"HTTP_USER_AGENT"}; $referertopasson = $ENV{"HTTP_REFERER"}; $ua = LWP::UserAgent->new; $ua->agent($uatopasson); $req = HTTP::Request->new (GET => "$file"); $req->header('referer' => $referertopasson); $res = $ua->request($req); $webpage = $res->content; print "Content-type: text/html\n\n"; $parser = HTML::TokeParser::Simple->new(\$webpage); $html = ''; $parser->get_tag('body'); # skip to first body tag while (my $token = $parser->get_token) { last if $token->is_end_tag('body'); $html .= $token->as_is; } print $html;

      I would also like to point out that I've very reluctantly left off "use strict" and warnings. Check the link to my CGI course (below) for more information.

      Cheers,
      Ovid

      New address of my CGI Course.