Re^2: Parsing HTML

Thanks for the replies. I do appreciate it. Here is what I have so far:

#!/usr/bin/perl -w
use strict;
use HTTP::Lite;
use HTML::TreeBuilder;

my $http = new HTTP::Lite;

my $req = $http->request("http://has.ncdc.noaa.gov/pls/plhas/has.dssel
+ect") or die "Unable to get document: $!";
die "Request failed ($req): ".$http->status_message() if $req ne "200"
+;
my $body = $http->body();

my $t = HTML::TreeBuilder->new_from_content($body) or die qq{cant buil
+d tree: $!\n};

my @anchors = $t->look_down(_tag => q{a});
for my $anchor (@anchors){
    if ($anchor->as_text eq q{NEXRAD Level III}){
        $anchor->replace_with_content;
    }
}

print $t->as_HTML(
    undef,
    q{  },
    {},
);

$t->delete;
[download]

This, of coarse is not much different from the above code. I just dont understand this HTML stuff, and cant seem to find any pages to really help. I want to Take out the text contained in href when the text is Level III and put it into a variable. Is href a node, namespace, or what?

Comment on Re^2: Parsing HTML Download Code

Replies are listed 'Best First'.
Re^3: Parsing HTML by wfsp (Abbot) on Jun 16, 2009 at 17:57 UTC
`my @anchors = $t->look_down(_tag => q{a}); my @hrefs; for my $anchor (@anchors){ if ($anchor->as_text eq q{NEXRAD Level III}){ push @hrefs, $anchor->attr(q{href}); $anchor->replace_with_content; } } print qq{$_\n} for @hrefs;` [download] `HAS.FileAppSelect?datasetname=7000` [download] An href is attribute of an HTML element, hence `$anchor->attr(q{href})` :-) Have a look at the HTML::Element docs to see what the `look_down, as_text, attr and replace_with_content` methods do. update heh! Looking back at your original question I saw I want to remove the href... and I read that as removing the anchor tag from the HTML. :-) Did you mean you wanted to get/extract the hrefs and store them in an array? If so you don't need the `$anchor->replace_with_content;` [download] line. I'll get there in the end. :-)	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^3: Parsing HTML
by wfsp (Abbot) on Jun 16, 2009 at 17:57 UTC

my @anchors = $t->look_down(_tag => q{a});
my @hrefs;
for my $anchor (@anchors){
    if ($anchor->as_text eq q{NEXRAD Level III}){
        push @hrefs, $anchor->attr(q{href});
        $anchor->replace_with_content;
    }
}
print qq{$_\n} for @hrefs;
[download]

HAS.FileAppSelect?datasetname=7000
[download]

$anchor->attr(q{href})

Have a look at the HTML::Element docs to see what the look_down, as_text, attr and replace_with_content methods do.

update

heh! Looking back at your original question I saw

I want to remove the href...

Did you mean you wanted to get/extract the hrefs and store them in an array? If so you don't need the

$anchor->replace_with_content;
[download]

[reply]
[d/l]
[select]