http://qs1969.pair.com?node_id=493645

Monks,

I return from my quest with success!! It took me a couple of days and many ponderings on Monks' responses to my previous postings, but I now have code that will enter into my selected user/pass protected https webpage and cruise around internally to do almost whatever I want or need. I thought I would post and share as there were no definitive answers during my quest on how to get this done.

Comments for improvement are welcome.

#!/usr/bin/perl use strict; use Crypt::SSLeay; use LWP::UserAgent; use WWW::Mechanize; use HTTP::Cookies; use HTTP::Request; ##The first two lines of code are used because ##I was sending my results to a web browser. use CGI::Carp qw/ fatalsToBrowser /; print "Content-type: text/html\n\n"; my $user = 'MyUser'; my $pass = 'MyPass'; my $base_address = 'https://wwws.stocksRus.com'; my $output = ''; my $dv_data = ''; my $webpage = ''; my $url = 'https://wwws.stocksRus.com/cgi-bin/LogIn'; my $agent = WWW::Mechanize->new( autocheck => 1); # Set up cookie jar $agent->cookie_jar(HTTP::Cookies->new); ##Go through login to get appropriate cookies ##so that I can then move onto ##the following pages with my cookie jar properly loaded. $agent->get($url); die $agent->response->status_line unless $agent->success; #In order to login properly it was necessary to send #a hidden input called DV_DATA along with user/pass that #appeared to be time based and was assigned on entry to #the main login page. I extracted this value and #assigned it as one of the inputs for the form. #This is a critical component of the login. $output = $agent->content; for ($output =~ /name=\"DV_DATA\" type=\"hidden\" VALUE=\"(.*?)\">/smi +){ $dv_data = $1; } $agent->form_name( 'LoginFormName' ); $agent->set_fields( USERID => $user, PASSWORD => $pass, DV_DATA => $dv_data ); $agent -> submit(); ###END Login ##I now have the appropriate cookies and may continue. #On the main page there is some javascript that codes for #the pull down menus. I could not access these using #WWW::Mechanize and one of them had the link I needed. #To solve, I logged into the webpage in a normal browser #and found the url of the page that I wanted and plugged #it in below. ##Accessing this page directly does not work as there ##is some Authorization procedure that requires going ##through login first. $agent->get('https://wwws.stocksRus.com/cgi-bin/Quotes'); die $agent->response->status_line unless $agent->success; #This page has a form that holds a symbol that I can #submit on to get data on that symbol. $agent -> field ('ticker','AMGN'); $agent -> submit(); #At first the webpage wouldn't come up in my web browser #(which is where the output from this run was sent), #because I didn't have the base URL. All the addresses #were similar to "/cgi-bin/menu/equity". #I substitute the base URL in below. $webpage = $agent -> content; $webpage =~ s/\"\//\"$base_address\//smig; $webpage =~ s/\'\//\"$base_address\//smig; #Prints out my webpage of interest. print $webpage; exit;
Chris Herold

Replies are listed 'Best First'.
Re: WWW::Mechanize to Access HTTPS with Cookies
by Thelonius (Priest) on Sep 21, 2005 at 04:18 UTC
    WWW::Mechanize should take care of hidden fields automatically, so you could delete these lines:
    for ($output =~ /name=\"DV_DATA\" type=\"hidden\" VALUE=\"(.*?)\">/smi +){ $dv_data = $1; }
    and this line:
    DV_DATA => $dv_data

    This line is wrong:

    $webpage =~ s/\'\//\"$base_address\//smig;
    because of the quote mismatch. (Also, only the g flag is actually needed on this statement.)

    You might want something like:

    $webpage =~ s!(=['"])/!$1$base_address/!g;
Re: WWW::Mechanize to Access HTTPS with Cookies
by mrkoffee (Scribe) on Sep 21, 2005 at 05:41 UTC

    WWW::Mechanize is a subclass of LWP::UserAgent, so you don't need to use LWP::UserAgent, as Mech will take care of that for you.

    Cheers,

    Brian

Re: WWW::Mechanize to Access HTTPS with Cookies
by gargle (Chaplain) on Sep 21, 2005 at 06:11 UTC

    Hi,

    I needed something similar the day before yesterday:

    At work we have a system that reads our badge whenever we enter or leave the building. A J2EE application has been set up to let you look for the hours you worked. Unfortunately what you have to do is to log in in the J2EE site, search for your own name, click on your name found and then go to the hours reported. You'll see a calender view with only the total hours inside the building for the day. When you click on such a total it shows you the details.

    I wanted a two month maximum overview of total hours and details and I didn't want to click myself to death, so:

    #!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; my $mech = WWW::Mechanize->new(); $mech->get("http://some.site.on.our.intranet/jsp/indexPage.do"); $mech->submit_form( fields => { userName => 'myUserid', password => 'myPasswordYouDontNeedToKnow', } ); $mech->follow_link( text_regex => qr/Search for Employee/ ); $mech->submit_form( fields => { nom => 'myFirstName', prenom => 'myLastName', } ); $mech->follow_link( text_regex => qr/myFullName/ ); $mech->follow_link( text_regex => qr/Overview Badge Hours/ ); $mech->follow_link( n => 17 ); # link 17 is the previous month dump_hours(); $mech->follow_link( n => 18 ); # link 18 is the next month dump_hours(); $mech->follow_link( text_regex => qr/Logoff/ ); # the end sub dump_hours { my @links = $mech->find_all_links( text_regex => qr/\d+:\d+/ ); foreach my $link (@links) { my $total = $link->text(); # total hours worked $mech->get($link); my $text = $mech->content( format => 'text'); if ( $text =~ /(... \d\d\/\d\d\/\d\d\d\d) INOUT(.+)$/ ) { my $date = $1; my $hours = $2; print $date . " (" . $total . ") "; # $hours is a long string of hours, 5 positions wide while ($hours) { my $slice = substr($hours,0,5)."-".substr($hours,5,5); $hours .= "?"x5 if (length($hours) % 10 == 5); $hours = substr($hours,10); print $slice . " " ; } print "\n"; } } }

    Output:

    bash-3.00$ ./pgase.pl Mon 01/08/2005 (09:17) 07:05-11:28 11:30-16:52 Tue 02/08/2005 (09:31) 07:05-12:25 12:33-16:27 16:28-16:55 16:55-17:06 + Wed 03/08/2005 (09:27) 07:03-09:35 09:36-09:43 09:44-11:22 11:25-17:00 + Thu 04/08/2005 (11:20) 07:04-11:58 12:04-14:54 14:55-15:24 15:25-16:19 + 16:19-17:04 17:05-17:38 17:38-18:54 Fri 05/08/2005 (09:58) 06:50-07:11 07:12-09:12 09:15-10:16 10:16-17:18 + Mon 08/08/2005 (09:19) 07:03-16:52 Tue 09/08/2005 (08:18) 07:03-11:14 11:16-15:51 Wed 10/08/2005 (08:15) 07:04-11:38 11:40-15:49 Thu 11/08/2005 (09:05) 07:04-12:58 13:52-17:03 Fri 12/08/2005 (09:10) 07:06-11:36 11:38-16:46 Tue 16/08/2005 (09:07) 07:04-11:03 11:06-16:42 Wed 17/08/2005 (09:23) 07:04-11:00 11:05-16:57 Thu 18/08/2005 (09:29) 07:04-17:03 Fri 19/08/2005 (09:17) 07:04-11:37 11:40-16:51 Mon 22/08/2005 (09:15) 07:05-08:05 08:06-09:04 09:05-11:20 11:24-15:15 + 15:16-15:31 15:32-16:50 Tue 23/08/2005 (09:32) 07:03-12:40 12:49-17:06 Wed 24/08/2005 (09:19) 07:05-11:23 11:27-16:54 Thu 25/08/2005 (09:23) 07:04-11:10 11:13-11:50 11:51-12:04 12:04-16:58 + Fri 26/08/2005 (09:27) 07:03-11:39 11:47-14:02 14:03-14:23 14:24-17:01 + Mon 29/08/2005 (09:27) 07:04-11:35 11:40-17:01 Tue 30/08/2005 (08:17) 07:04-11:04 11:09-15:52 Wed 31/08/2005 (08:35) 07:04-07:10 07:10-11:31 12:45-16:54 Thu 01/09/2005 (10:13) 07:06-17:50 Fri 02/09/2005 (09:28) 07:05-08:36 09:04-17:03 Thu 08/09/2005 (00:37) 16:31-16:35 16:36-17:09 Mon 12/09/2005 (09:16) 07:06-12:40 12:45-13:10 13:11-16:53 Tue 13/09/2005 (09:10) 07:04-11:41 11:45-11:46 11:47-16:45 Wed 14/09/2005 (09:35) 07:04-17:10 Mon 19/09/2005 (08:44) 07:36-12:52 12:56-15:14 15:14-15:21 15:22-16:50 + Tue 20/09/2005 (07:17) 07:34-09:52 09:52-11:16 11:17-11:37 11:38-11:44 + 13:52-17:01

    Ok, it's far from perfect (perfect would be to connect directly to the database and to select the hours with an sql statement - but unfortunately we don't have direct access), if the layout changes my little 'format => text and regexp' trick my program will stop to work.

    --
    if ( 1 ) { $postman->ring() for (1..2); }