Hello Monks. I need some advice about parsing HTML code using HTML::TreeBuilder.

I have some HTML code, and I need some info within a table tag. There are, let's say 20 table tags, but the info requires is in table 15.

How do I know that in table 15 is the required info? Well, I just search for the info in notepad with ctrl+F and next I count the table tags from the beginning.

As you can see the process is very tedious.

The question is that HTML::Tree builder inherits a function from HTML::Element to dump the HTML code. The dumped HTML code looks like this:

td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0.4.1.9.3 <select id="searchField9" name="searchField9" on +change="getUtilityListValues(this, &quot;PhoneFindListForm&quot;, upd +ateUtilityList)" size="1"> @0.1.9.0.0.0.0.4.1.9.3.0 <option selected value="device.name"> @0.1.9.0 +.0.0.0.4.1.9.3.0.0 "Device Name" <option value="device.description"> @0.1.9.0.0 +.0.0.4.1.9.3.0.1 "Description" <option value="numplan.dnorpattern"> @0.1.9.0. +0.0.0.4.1.9.3.0.2 "Directory Number" <option value="callingsearchspace.name"> @0.1. +9.0.0.0.0.4.1.9.3.0.3 "Calling Search Space" <option value="devicepool.name"> @0.1.9.0.0.0. +0.4.1.9.3.0.4 "Device Pool" <option value="TypeProduct.name"> @0.1.9.0.0.0 +.0.4.1.9.3.0.5 "Device Type" <option value="pickupgroup.name"> @0.1.9.0.0.0 +.0.4.1.9.3.0.6 "Call Pickup Group" <option value="TypeCertificateStatus.name"> @0 +.1.9.0.0.0.0.4.1.9.3.0.7 "LSC Status" <option value="device.authenticationString"> @ +0.1.9.0.0.0.0.4.1.9.3.0.8 "Authentication String" <option value="TypeDeviceProtocol.name"> @0.1. +9.0.0.0.0.4.1.9.3.0.9 "Device Protocol" <option value="securityprofile.name"> @0.1.9.0 +.0.0.0.4.1.9.3.0.10 "Security Profile" <option value="commondeviceconfig.name"> @0.1. +9.0.0.0.0.4.1.9.3.0.11 "Common Device Configuration" <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.4 <select id="searchLimit9" name="searchLimit9" si +ze="1"> @0.1.9.0.0.0.0.4.1.9.4.0 <option selected value="beginsWith"> @0.1.9.0. +0.0.0.4.1.9.4.0.0 "begins with" <option value="contains"> @0.1.9.0.0.0.0.4.1.9 +.4.0.1 "contains" <option value="endsWith"> @0.1.9.0.0.0.0.4.1.9 +.4.0.2 "ends with" <option value="isExactly"> @0.1.9.0.0.0.0.4.1. +9.4.0.3 "is exactly" <option value="isEmpty"> @0.1.9.0.0.0.0.4.1.9. +4.0.4 "is empty" <option value="isNotEmpty"> @0.1.9.0.0.0.0.4.1 +.9.4.0.5 "is not empty" <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.5 <input id="searchString9" name="searchString9" o +nkeypress="javascript:onEnterKey(event)" type="text" value="" /> @0.1 +.9.0.0.0.0.4.1.9.5.0 <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.6 <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.7 <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.8 <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.9 <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.10

You can see that for each line there is this: "@0.1.9.0.0.0.0.4.1.9.10" or something. This tell you the position of the line in the tree.

So the question is, do you know a way to tell the module, hey I want to work from @0.1.9.0.0.0.0.4.1.9.10, or another way to make the process I described simpler?

Thanks very much!


In reply to Parse HTML using HTML::TreeBuilder by oldwarrior32

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.