Alien has asked for the wisdom of the Perl Monks concerning the following question:

I usually encountered websites where after you login to a page , a javascript build a link that you should follow next in order to reach whatever you're searching ... to a human controlled browser that is ok , but what about mechs ? How can you get that link ?

Replies are listed 'Best First'.
Re: How to handle javascript
by Joost (Canon) on Jan 27, 2007 at 19:08 UTC
      Javascript is not the problem it is access to the DOM. For simple javascript you can do something like this:
      our @URLS; my $runtime = JavaScript::Runtime->new();; my $context = $runtime->create_context(); $context->bind_function( name => 'open', func => sub { push(@URLS, @_); return 0; } );
      Then
      $context->eval(q(open("http://bob.com/")))
      will push http://bob.com/ onto @URLS.
      -- gam3
      A picture is worth a thousand words, but takes 200K.
Re: How to handle javascript
by andyford (Curate) on Jan 27, 2007 at 22:56 UTC
      How can a proxy be set with those clones?
        AFAIK those clones honour the http_proxy environment variable, e.g. on UNIX (sh, bash,...):
        $ export http_proxy=http://proxy.example.com:3128/ $ perl myclonescript.pl

        then the "clone" in the myclonescript.pl will connect any site through proxy.example.com.

        Or set it inside the script:

        BEGIN { $ENV{http_proxy} = 'http://proxy.example.com:3128/'; } use WWW::Mechanize ...

        Note that the environment variable must be set before any "clone" module is loaded.

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
        Mechanize uses LWP::UserAgent, so you can set proxies the same way.


        ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
        =~y~b-v~a-z~s; print
Re: How to handle javascript
by blue_cowdawg (Monsignor) on Jan 27, 2007 at 19:10 UTC
        but what about mechs ? How can you get that link ?

    You haven't given me much to work on here but in any event let me point out WWW::Mechanize, HTML::TokeParser and friends. If you have an idea of what patterns of response you are getting it should not be that big a deal to slurp out the built up link from the response string and then follow it.


    Peter L. Berghold -- Unix Professional
    Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg