Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^3: WWW::Mechanize find_link question.

by Adrade (Pilgrim)
on May 13, 2005 at 06:41 UTC ( [id://456615]=note: print w/replies, xml ) Need Help??


in reply to Re^2: WWW::Mechanize find_link question.
in thread WWW::Mechanize find_link question.

Dear Merlyn,

It so happens that this particular user is trying to parse specifically formatted HTML. I would normally agree with you, but a regex is especially convenient when one is expecting data of a certain structure - this seems to meet that condition.

Also, I'm interested in how you would modify the regex to meet your more stringent requirements. Always looking to better my ability here.

  -Adam
  • Comment on Re^3: WWW::Mechanize find_link question.

Replies are listed 'Best First'.
Re^4: WWW::Mechanize find_link question.
by merlyn (Sage) on May 13, 2005 at 10:11 UTC
    Perhaps you missed the "WWW::Mechanize" in the subject? The original poster is already using Mechanize, and already has had the document parsed using proper means under the hood, and the question was about using find_links properly.

    Thus, a solution to abandon all that seems crazy. That's the craziness I was pointing out.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Dear Merlyn,

      Oh- I see what you mean, but no, I didn't miss the subject line. Thank you for noticing that as a possibility.

      Perhaps it is because I'm new to this board that I don't understand the standardizations that have been universally adopted here - but I code according to the original perl philosophy of tmtowtdi, and the actual question asked I need to extract the last link (and only the last link) which contains the image. Is there a way to do this which I'm not seeing? is the one to which I was responding.

      Best,
        -Adam
Re^4: WWW::Mechanize find_link question.
by tphyahoo (Vicar) on May 13, 2005 at 09:41 UTC
    The above exchange over how to parse html seems to be a running controversy in the monastery.

    At Being a heretic and going against the party line, browseruk criticizes "cargo cult" reliance on html::tokeparser, html::treebuilder, and other html::* modules when regexes would do fine, and also because the html::s are hard to learn and don't deserve the praise the community gives them:

    This was in reply to Parsing HTML tags with regex, which is a good starting thread for various methods of parsing html, including browseruk's simple regex solution, which led to all the controversy after he got downvoted.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://456615]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-04-18 05:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found