Re: Removing selective tags and content between

Try HTML::TokeParser::Simple. It will handle most of your parsing needs.

use strict;
use warnings;

use HTML::TokeParser::Simple;
my $page   = do { local $/; <DATA> };
my $parser = HTML::TokeParser::Simple->new(\$page);

my $html = '';

$parser->get_tag('body'); # skip to first body tag
while (my $token = $parser->get_token) {
    last if $token->is_end_tag('body');
    $html .= $token->as_is;
}
print $html;


__END__
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
    <head>
        <title>test</title>
    </head>
    <body>
        <h1>headline</h1>
        <p>Content</p>
    </body>
</head>
[download]

Cheers,
Ovid

New address of my CGI Course.

Comment on Re: Removing selective tags and content between Download Code

Replies are listed 'Best First'.
Re: Re: Removing selective tags and content between by diamich (Initiate) on Oct 15, 2003 at 14:33 UTC
Thanks for the reply Ovid. Can I ask that you be a little more explicit? I'm TOTALLY Perl illiterate and since your solution doesn't use any of the variables I have, I can't even hazard a guess as to where I should put it or which variables I should change to match mine (or vice versa). Additionally, I should probably add that the two domains are on different hosts if that makes a difference.	[reply]
Re: Re: Re: Removing selective tags and content between by Ovid (Cardinal) on Oct 15, 2003 at 16:07 UTC
use HTML::TokeParser::Simple; $uatopasson = $ENV{"HTTP_USER_AGENT"}; $referertopasson = $ENV{"HTTP_REFERER"}; $ua = LWP::UserAgent->new; $ua->agent($uatopasson); $req = HTTP::Request->new (GET => "$file"); $req->header('referer' => $referertopasson); $res = $ua->request($req); $webpage = $res->content; print "Content-type: text/html\n\n"; $parser = HTML::TokeParser::Simple->new(\$webpage); $html = ''; $parser->get_tag('body'); # skip to first body tag while (my $token = $parser->get_token) { last if $token->is_end_tag('body'); $html .= $token->as_is; } print $html; [download] I would also like to point out that I've very reluctantly left off "use strict" and warnings. Check the link to my CGI course (below) for more information. Cheers, Ovid New address of my CGI Course.	[reply] [d/l]