in reply to Re: Parsing HTML
in thread Parsing HTML

Thanks for the replies. I do appreciate it. Here is what I have so far:
#!/usr/bin/perl -w use strict; use HTTP::Lite; use HTML::TreeBuilder; my $http = new HTTP::Lite; my $req = $http->request("http://has.ncdc.noaa.gov/pls/plhas/has.dssel +ect") or die "Unable to get document: $!"; die "Request failed ($req): ".$http->status_message() if $req ne "200" +; my $body = $http->body(); my $t = HTML::TreeBuilder->new_from_content($body) or die qq{cant buil +d tree: $!\n}; my @anchors = $t->look_down(_tag => q{a}); for my $anchor (@anchors){ if ($anchor->as_text eq q{NEXRAD Level III}){ $anchor->replace_with_content; } } print $t->as_HTML( undef, q{ }, {}, ); $t->delete;
This, of coarse is not much different from the above code. I just dont understand this HTML stuff, and cant seem to find any pages to really help. I want to Take out the text contained in href when the text is Level III and put it into a variable. Is href a node, namespace, or what?

Replies are listed 'Best First'.
Re^3: Parsing HTML
by wfsp (Abbot) on Jun 16, 2009 at 17:57 UTC
    my @anchors = $t->look_down(_tag => q{a}); my @hrefs; for my $anchor (@anchors){ if ($anchor->as_text eq q{NEXRAD Level III}){ push @hrefs, $anchor->attr(q{href}); $anchor->replace_with_content; } } print qq{$_\n} for @hrefs;
    HAS.FileAppSelect?datasetname=7000
    An href is attribute of an HTML element, hence $anchor->attr(q{href}) :-)

    Have a look at the HTML::Element docs to see what the look_down, as_text, attr and replace_with_content methods do.

    update

    heh! Looking back at your original question I saw

    I want to remove the href...
    and I read that as removing the anchor tag from the HTML. :-)

    Did you mean you wanted to get/extract the hrefs and store them in an array? If so you don't need the

    $anchor->replace_with_content;
    line. I'll get there in the end. :-)