Re: How would I extract body from an html page

HTML::Element's $ele->detach_content could help you with that.

#! /usr/bin/perl

use strict;
use warnings;

use HTML::TreeBuilder;
my $t = HTML::TreeBuilder->new_from_content(do{local $/;<DATA>});
my $body = $t->look_down(_tag => q{body});
my @content = $body->detach_content;

print $_->as_HTML for @content;

__DATA__
<html>
<head><title>title</title><head>
<body>
<h1>heading one</h1>
<p>paragraph <b>bold</b></p>
<p>paragraph</p>
</body>
</html>
[download]

<h1>heading one</h1>
<p>paragraph <b>bold</b>
<p>paragraph
[download]

See also rhesa's snippet for a discussion on optional tags and xhtml empty tags if this is a concern.

Comment on Re: How would I extract body from an html page Select or Download Code

Replies are listed 'Best First'.
Re^2: How would I extract body from an html page by corpx (Acolyte) on Jul 25, 2009 at 19:34 UTC
Thanks guys :)	[reply]