in reply to Re^2: Strip PHP page
in thread Strip PHP page

Is this table you speak of in a frame (or an iframe) or written out by JavaScript (document.write or such). Since you didn't simply give us the URL or the complete HTML (in read more tags) it is difficult to help. I am guessing you are using WWW::Mechanize to get this page, since you did not tell us exactly what you are doing here you have not made it easy for people to help you. See the PerlMonks FAQ and How do I post a question effectively?.

Martin

Replies are listed 'Best First'.
Re^4: Strip PHP page
by bauer1sc (Initiate) on Aug 06, 2007 at 18:23 UTC
    Sorry Martin if my question was unclear. I am using WWW::Mechanize to get the contents of the website and then using the HTML::Strip to ride the html tags. The url is
    http://www.whosregistered.com/iso/form.php
    only option I want to specify is country United States. Do the search and the results are in the table. Thanks again for your help

      The info you appear to be seeking is all contained between the (nested) <table class="searchbox"> and the next-following </table> (in each page, of which there are many; iteration from page to page is left as an exercise for the student).

      The only php in that table is in the first non-empty cell of each row starting with the second1 row, to wit:

      <td><a href="./form.php?stage=3&search_total=http ... stage%3D2&connector_id=1018">More</a></td>

      1 OT: Row one really should be in a <thead><tr><th>....<thead> construct.

      Your sample output suggests that you don't need the detail produced by the php above-noted links there, so there's little reason to do more than capture only that table (in each of the roughly 1000 pages listing approx 30 US companies with relevant ISO certificates) and within each, strip the .html. You'll have a column with the word 'More' beginning each row, but that's scarcely the end of the world and simply cured with a substitution ( something not much more complex than s/^\tMore\t//; perhaps? UNtested, as even the source view of your sample does NOT tell me exactly what's in the start of your row, before and after 'More' ) after the html cleanup.