in reply to Extracting content from html
First let's clarify your objective. From what I understand of both your introduction and subsequent codes:
- Enter query term
- Attempt to download input file from known server/directory
- If not 404, parse html for desired link
- Attempt to download parsed link
- If not 404, parse html for desired content
You seem to be able to acquire the html page, although you presume it's not 404; however in order to properly comment as to why your HTML::TokeParser::Simple code doesn't work you'll have to elaborate both the content you're trying to access ("the necessary data from specific actor" is particluarly vague) as well as the bounding HTML content.
I'd further assert that while descending the HTML structure will work, it may break if the site should be redesigned. Depending on what you're trying to accomplish, it may be easier to use a regex.
From what I can make of your example code, you're working with a HTML file of the form:
<html><head></head><body> <div class=wrapper> <a href="http://sub.domain.tld/folder/page.html">link text</a> </div> </body></html>
which links to a page of the form:
<html><head></head><body> <table> <tr><td class=quote>To be or not to be...</td></tr> </table> </body></html>
The appearant 'dt' vs 'td' typo aside, I'd still need to know what criteria you're trying to employ to select which actor, which quote, et all...
|
|---|