in reply to Embedded Table Headers with HTML::TableExtract

Since you don't provide a sample of the HTML you are attempting to parse, it's pretty much impossible to knwo what's going on. I think you mean that you have one table contained within another table, and it is the inner table that interests you.

Assuming that is the case, and looking at the doc for HTML::TableExtract, it appears that you can pass it a string with $te->parse($html);, so I'd parse the outer table, and upon finding the element which is the contained table, hand that text to another properly defined instance of HTML::TableExtract->parse().

Update: After playing around, I observe the following:

Using depth=><some_numer> would let you pick out the embeddded table you want.

HTML::TableExtract is pretty cool.

--Bob Niederman, http://bob-n.com
  • Comment on Re: Embedded Table Headers with HTML::TableExtract

Replies are listed 'Best First'.
Re: Re: Embedded Table Headers with HTML::TableExtract
by Anonymous Monk on Jul 13, 2003 at 03:55 UTC
    Yeah, I knew that might be an issue. But the data isn't accessable through the web (it's on an internal server behind a firewall for work). I've tried to recreate the problem in the HTML below.

    What I am trying to pull out is that "Header 2" that is actually in the "Header 1" position of the next deeper table. Changing the depth with TableExtract will pull out a deeper table, but it doesn't link it with the previous table. The documentation for this module talks about chaining tables, but it seems as I understand it, that you are chaining the table for ALL cases. (ie. not just the Header 2 situation, but Header 1,2,3,4 would then be expected to be in a deeper table)

    Seems like a case this module probably didn't expect and wouldn't be expected to work properly with, however, I'm wondering if there is way I can still get the functionality out.... thanks!

    <table> <tr> <td>Header 1</td> <td> <table> <tr><form action="search.php" name="searchsimple" meth +od="post"> <td><b>Header 2</b> </td> <td> <a href="thispage.php" ><font size="1" co +lor="white">Update this page</font></a> </td> <td> <table> <tr> td><b>Search the web:</b></font> + </td> <td> <input type="text" name=sear +chtext size=10> <input type="submit" value=" +Search!"> </td> </tr> </table> </td> </tr></form> </table> </td> <td> Header 3</td> <td> Header 4</td> </tr> <tr>... data </tr> <tr>... data </tr> <tr>... data </tr> </table>