Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to distinguish table rows in HTML that just contain links form table rows where text segments and links are presented together, in the same row. I'm using TreeBuilder because I have to know the depth of the diffrent table rows. My problem is telling the diffrence between text segments and links, because text segments aren't seen as elements by TreeBuilder. I'm thankfull for all suggestions to my problem.

Replies are listed 'Best First'.
Re: Parsing HTML with TreeBuilder
by fglock (Vicar) on Sep 19, 2002 at 13:34 UTC

    Is this what you mean? you want to tell if a row only contains cells like

    <td><a href .... > only-a-link </a></td>

    or does it also contain cells like

    <td> text <a href .... > optional-link </a> more-text </td>
      Yes, that is exactly what I mean. Do you have any suggestions? Thanks.
      The row can contain both types of cells, and I would like to be able to tell the difrence between the to kinds of cells. The one with just links and the ofhter with links inbedded in text segments.

        Assuming that $ele is an HTML::Element object containing your <td> element, then:

        if (grep {!ref($_) or $_->tag ne "a"} $td->content_list) { # The element has non-link text } else { # The element only has <a></a> elements in it }

        See the HTML::Element documentation for more information. There may be a more efficient method than the above, but you havn't specified your problem well enough for me to see it.

        perl -pe '"I lo*`+$^X$\"$]!$/"=~m%(.*)%s;$_=$1;y^`+*^e v^#$&V"+@( NO CARRIER'