passingby has asked for the wisdom of the Perl Monks concerning the following question:

Your excellencies: I used as an example this
$booty->get("http://www.hotairballooning.com/clubs.php"); @url_array = $bot->find_all_links( text_regex => qr/www/ ); foreach(@url_array){ print $counter += 1, ":"; print $_->url(), "\n"; } # end of foreach
and it works beautifully. However, when I try to apply that same code to another page, it brings nothing, but in this case the urls are not that simply hardcoded but embedded in JS code(yes I know that Mechanize does not work on JS, but not fully exactly know what they mean by "it does not work on JS, it can mean that it won't be able to follow a link that is embedded in a onClick event, OR will it also mean that it won't obey me if I try to use that regex to obtain the URL that is inside this JS code:
onclick="javascript:window.location='http://www.aaaa.org/bbb/cccc.php2 +?id=9244'
It just won't pick it up. Does it mean that the find_all_links method won't work there ? Am I then left with the only option of grabbing the $booty->content() maybe turn it into plain text and then just do the regex on them ? Thank you very much P.Y

Replies are listed 'Best First'.
Re: Mechanize find links query
by Corion (Patriarch) on Dec 16, 2011 at 14:56 UTC

    Mechanize will only ever give you the href attribute of the HTML when you ask it for links. Nothing else.

    If the page does not work with Javascript switched off in your browser, it will be harder to automate with Mechanize.

Re: Mechanize find links query
by TJPride (Pilgrim) on Dec 16, 2011 at 15:58 UTC
    Pretty easy to pattern match if the URL's are hardcoded and always asssigned directly to window.location, but if the pages get complicated and assign the URL to a variable and then assign that to window.location, or specify the URL + variable arguments, you'll need something that can actually parse the Javascript and run it like would if you were viewing it in a web browser. Don't know how you'd go about solving that one. Are you only trying to farm one site, or are you trying for a general solution that works on any web site?
      Thank you for your reply. yes, that is what i am thinking about doing then, grab the whole page with 'content' and then pattern match it, the URLs are dynamically assigned to a variable but the URL always looks exactly the same, the only thing that varies is the last 4 digits which can also be grabbed without a problem whichever combination they come up with character class . I am 'farming' that website, but the issue will be common for many. even though the links are dynamically assigned, in the end they look like hardcoded, that is, you dont need to click on a JS link to make them show up. i ll work on that then, thanks regards