Mechanize find links query

passingby has asked for the wisdom of the Perl Monks concerning the following question:

Your excellencies: I used as an example this


$booty->get("http://www.hotairballooning.com/clubs.php");
    
@url_array = $bot->find_all_links( text_regex => qr/www/ );
    
    foreach(@url_array){
        print $counter += 1, ":";
        print $_->url(), "\n";
        
            
    }  # end of foreach
[download]

and it works beautifully. However, when I try to apply that same code to another page, it brings nothing, but in this case the urls are not that simply hardcoded but embedded in JS code(yes I know that Mechanize does not work on JS, but not fully exactly know what they mean by "it does not work on JS, it can mean that it won't be able to follow a link that is embedded in a onClick event, OR will it also mean that it won't obey me if I try to use that regex to obtain the URL that is inside this JS code:


onclick="javascript:window.location='http://www.aaaa.org/bbb/cccc.php2
+?id=9244'
[download]

It just won't pick it up. Does it mean that the find_all_links method won't work there ? Am I then left with the only option of grabbing the $booty->content() maybe turn it into plain text and then just do the regex on them ? Thank you very much P.Y

Comment on Mechanize find links query Select or Download Code

Replies are listed 'Best First'.
Re: Mechanize find links query by Corion (Patriarch) on Dec 16, 2011 at 14:56 UTC
Mechanize will only ever give you the `href` attribute of the HTML when you ask it for links. Nothing else. If the page does not work with Javascript switched off in your browser, it will be harder to automate with Mechanize.	[reply] [d/l]
Re: Mechanize find links query by TJPride (Pilgrim) on Dec 16, 2011 at 15:58 UTC
Pretty easy to pattern match if the URL's are hardcoded and always asssigned directly to window.location, but if the pages get complicated and assign the URL to a variable and then assign that to window.location, or specify the URL + variable arguments, you'll need something that can actually parse the Javascript and run it like would if you were viewing it in a web browser. Don't know how you'd go about solving that one. Are you only trying to farm one site, or are you trying for a general solution that works on any web site?	[reply]
Re^2: Mechanize find links query by Anonymous Monk on Dec 28, 2011 at 12:05 UTC
Thank you for your reply. yes, that is what i am thinking about doing then, grab the whole page with 'content' and then pattern match it, the URLs are dynamically assigned to a variable but the URL always looks exactly the same, the only thing that varies is the last 4 digits which can also be grabbed without a problem whichever combination they come up with character class . I am 'farming' that website, but the issue will be common for many. even though the links are dynamically assigned, in the end they look like hardcoded, that is, you dont need to click on a JS link to make them show up. i ll work on that then, thanks regards	[reply]