First let's clarify your objective. From what I understand of both your introduction and subsequent codes:

  1. Enter query term
  2. Attempt to download input file from known server/directory
  3. If not 404, parse html for desired link
  4. Attempt to download parsed link
  5. If not 404, parse html for desired content

You seem to be able to acquire the html page, although you presume it's not 404; however in order to properly comment as to why your HTML::TokeParser::Simple code doesn't work you'll have to elaborate both the content you're trying to access ("the necessary data from specific actor" is particluarly vague) as well as the bounding HTML content.

I'd further assert that while descending the HTML structure will work, it may break if the site should be redesigned. Depending on what you're trying to accomplish, it may be easier to use a regex.

From what I can make of your example code, you're working with a HTML file of the form:

<html><head></head><body> <div class=wrapper> <a href="http://sub.domain.tld/folder/page.html">link text</a> </div> </body></html>

which links to a page of the form:

<html><head></head><body> <table> <tr><td class=quote>To be or not to be...</td></tr> </table> </body></html>

The appearant 'dt' vs 'td' typo aside, I'd still need to know what criteria you're trying to employ to select which actor, which quote, et all...


In reply to Re: Extracting content from html by eibwen
in thread Extracting content from html by sdslrn123

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.