spx2 has asked for the wisdom of the Perl Monks concerning the following question:

___
  • Comment on HTML::TreeBuilder documentation tests trouble

Replies are listed 'Best First'.
Re: HTML::TreeBuilder documentation tests trouble
by GrandFather (Saint) on Jul 08, 2007 at 00:08 UTC

    Show us a small self contained runnable sample that demonstrates the problem. Take a look at I know what I mean. Why don't you? for some ideas about how to do that.

    Ask us about your larger problem, not the small detail that you perceive as the immediate blocker. We can give much better answers when we know the bigger picture.


    DWIM is Perl's answer to Gödel
Re: HTML::TreeBuilder documentation tests trouble
by Cody Pendant (Prior) on Jul 08, 2007 at 04:41 UTC
    Here's a complete guess as to what you mean.

    You're looking for something like a TD element with the class "important" and you want the text content of the TD preceding it.

    This would do it:

    #!/usr/bin/perl use strict; use warnings; use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new; # empty tree $tree->parse_file( \*DATA ); my $matching_td = $tree->look_down( "_tag", "td", "class", "important" ); my $td_before = $matching_td->left(); print $td_before->as_text(); __DATA__ <html> <head> <title>Untitled</title> </head> <body> <table> <tr> <td> foo </td> <td class="important"> </td> </tr> <tr> <td> bar </td> <td class="boring"> </td> </tr> </table> </body> </html>

    The above code prints "foo" because that's the text content of the TD immediately preceding the one with the right attribute. Though as GrandFather says, we have no idea what you really mean.



    Nobody says perl looks like line-noise any more
    kids today don't know what line-noise IS ...
Re: HTML::TreeBuilder documentation tests trouble
by badaiaqrandista (Pilgrim) on Jul 08, 2007 at 09:26 UTC

    To parse HTML I usually use HTML::TokeParser instead of HTML::TreeBuilder. The latter seems to me too overkill for most problems. This is probably how you can do the parsing with HTML::TokeParser:

    use HTML::TokeParser; my $p = HTML::TokeParser->new("index.html") || die "Can't open: $!"; while (my $token = $p->get_token) { if ($token->[0] eq 'td') { my $txt = $p->get_trimmed_text; if ($txt eq "the text before what you're looking for") { $token = $p->get_token("td"); my $txt = $p->get_trimmed_text; # $txt will be the text you're looking for } } }
    -cheepy-