thekestrel has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
I'm trying to automate the process of getting some information from a site that requires authentication to get in (the logging in part I've done).
Once logged in I get presented a page in the browser which I can then click the link I want to get sed information. Looking at the html of the page I get from logging in has snippets like...
<a href="www.mysite.com?thing={VALUE}">click me</a>

The {VALUE}, I assume is an environment variable that is sent back after logging in which then gets passed along to the next link when I click it.
My problem is displaying the values of these variables. I've tried dumping the environment variables through ENV, but they're not there?
I'm using LWP::Useragent to login me into the site with HTTPS, is there a LWP method that will print them?
i.e.
#!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; user LWP::Debug qw (+); #Simplified a little bit as real site requires cookies and url redirec +tion etc... my $ua = LWP::UserAgent->new; my site = "https://www.secure.com/login"; my $req = HTTP::Request->new(POST => $site); my $req->content_type('application/x-www-form-urlencoded'); my $req->content('user=a_user&pass=a_pass'); my $req = $ua->request($req); if ($res->is_success) { print $res->content; } else { print $res->status_line, "\n"; } #Now how do I dump the var=value pairs from this returned page???
Thanks for the advice,

Regards Paul

Replies are listed 'Best First'.
Re: Environment Variable Question
by davidrw (Prior) on Jun 01, 2005 at 18:16 UTC
    I think you're confusing environment variables with cgi variables and/or cookie values.. My short answer would be to avoid it all together and parse the links, and the easiest way to do this would be to use WWW::Mechanize, which is based on LWP so you won't need to change much in your code, but you will be able to take advantage of the find_link() and find_all_links methods. You'll be specifically interested in something like:
    my @links = $res->find_all_links( url_regex => /\bvar=/ ); foreach my $link (@links){ # $link is a WWW::Mechanize::Link object next unless $link->url =~ /\bvar=([^&?]+)/; # this RE will need twe +aking warn "URL is " . $link->url; warn "Value is '$1'"; }
Re: Environment Variable Question
by NetWallah (Canon) on Jun 01, 2005 at 18:10 UTC
    It appears that you are confusing the Client side of the HTTP connection with the SERVER-side.

    The Server WILL see the env variables (if running as CGI). The client side (Your LWP agent) has access to the page generated - nothing else.

    In other words, if you want to see what the server got, you will need the server's cooperation - i.e. a page that the server generates (perhaps for debugging) that displays the variables it received.

         "There are only two truly infinite things. The universe and stupidity, and I'm not too sure about the universe"- Albert Einstein

      NetWallah,
      I think you're probably right in my confusing the sides of the connection.
      I assumed there was variables transmitted like env variables back to client side as it seems odd that the server would generate a link to a page and write variables as just placeholders with no dynamic content otherwise whats the point.
      Thanks for the input.

      Regards Paul
        I'll take a shot at guessing the intent of the web page :

        The {VALUE} placeholder may be used to trigger the receiving page (CGI App) to insert a pre-defined {VALUE} at a pre-determined place or places in response to the clicked item.

        Without more page context, I cannot speculate further.

             "There are only two truly infinite things. The universe and stupidity, and I'm not too sure about the universe"- Albert Einstein

Re: Environment Variable Question
by tlm (Prior) on Jun 01, 2005 at 18:17 UTC

    Let me preface my reply by noting that I barely understand your question (but that never stopped me from blurting out something :-) ). I think that what you want is to parse the HTML contents with a tool like HTML::TokeParser, which has facilities for easily listing tag attributes.

    the lowliest monk

Re: Environment Variable Question
by dynamo (Chaplain) on Jun 01, 2005 at 18:11 UTC
    The {VALUE} in the above question is not an environmental variable, at least in perl. If it were, and it were called VALUE, you'd be able to read and write it in $ENV{VALUE}.

    I'm not sure what you mean by dumping the var=value pairs from the returned page. Is the page a series of links with pairs in the URLs as given in your example? If yes, you'd have to write a regex to parse and retrieve them.

    Could you give a better description of the content of the returned page? From there it's much easier to describe how to turn it into a series of keys and values.