Parsing HTML using TreeBuilder

monsterzero has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I am trying to use HTML::TreeBuilder to parse some HTML data I have retrieved from the web. Specifily, I would like to extract the sunrise/sunset data from the web page. Below is what I have tried. The attribute I am looking for is everything between the pre tags, However I am afraid I do not understand what is being displayed when I print the all_attr method :(

Can anyone shead some light on this?

Thanks

Ron Hill


use strict;
use warnings;
use HTML::TreeBuilder;

my $data = do { local $/; <DATA> };

my $tree = HTML::TreeBuilder->new_from_content($data);

print $tree->all_attr();
__DATA__
<html>
<head><title>Sun and Moon Data for One Day</title></head>
<body>
<br>
<h4>U.S. Naval Observatory<br>Astronomical Applications Department</h4
+>
<br>
<h3>Sun and Moon Data for One Day</h3>

 <p>The following information is provided for Adelaide Australia

 (longitude E138.6, latitude S34.9): </p>
 <pre>
        Saturday
        21 June 2003          Universal Time + 9h

                         <strong>SUN</strong>
        Begin civil twilight      06:25
        Sunrise                   06:53
        Sun transit               11:47
        Sunset                    16:41
        End civil twilight        17:10

                         <strong>MOON</strong>
        Moonrise                  22:45 on preceding day
        Moon transit              05:24
        Moonset                   11:53
        Moonrise                  23:43
        Moonset                   12:19 on following day

 </pre>

 <p>Last quarter  Moon on 21 June      2003 at 23:45
 (Universal Time + 9h).              </p>


<br>
<br>
<br>
</body>
</html>
[download]

Edit by tye, replace PRE with P tags

Comment on Parsing HTML using TreeBuilder Download Code

Replies are listed 'Best First'.
Re: Parsing HTML using TreeBuilder by Art_XIV (Hermit) on Oct 28, 2003 at 20:27 UTC
You're focusing on all_attr() which probably isn't going to be of much use to you at this stage. You probably want something like: `use strict; use warnings; use HTML::TreeBuilder; my $data = do { local $/; <DATA> }; my $tree = HTML::TreeBuilder->new_from_content($data); my $pre_tag = $tree->look_down("_tag", "pre"); print $pre_tag->as_text(), "\n";` [download] Take a look at the the docs for Tree::Scanning for a basic tutorial and then have a look at the docs for HTML::Element since it is used alot by HTML::Treebuilder. HTML::Element has alot of methods that will be useful to you.	[reply] [d/l]

Replies are listed 'Best First'.

Re: Parsing HTML using TreeBuilder
by Art_XIV (Hermit) on Oct 28, 2003 at 20:27 UTC

You're focusing on all_attr() which probably isn't going to be of much use to you at this stage.

You probably want something like:

use strict;
use warnings;
use HTML::TreeBuilder;

my $data = do { local $/; <DATA> };

my $tree = HTML::TreeBuilder->new_from_content($data);
my $pre_tag = $tree->look_down("_tag", "pre");
print $pre_tag->as_text(), "\n";
[download]

Take a look at the the docs for Tree::Scanning for a basic tutorial and then have a look at the docs for HTML::Element since it is used alot by HTML::Treebuilder. HTML::Element has alot of methods that will be useful to you.

[reply]
[d/l]