Aldebaran has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I've been trying to understand how perl might populate the information of a browser object and then create an actual browser with a GUI thereafter, and I've found that the tool-chain I've been working with is either being misused--by someone of limited experience--or is not up to the task. My endeavor was consistent with the Virtue of Laziness, where one wants to minimize clicks of the same buttons or same text at frequented sites. In particular, I'd like to imitate keystrokes that allow personal banking, where SSL is used, as indicated by 'https' in the url. The documentation of WWW::Mechanize says it can handle SSL, but I get nowhere with it unless I use LWP::UserAgent.

Update

Apparently, the previous sentence is wrong. The following code works exactly how you'd expect, and I believed the latter had failed:

#! /usr/bin/perl use warnings; use strict; use 5.010; use WWW::Mechanize; my $url = 'https://www.huntington.com/'; my $mech = WWW::Mechanize->new; $mech->get($url); my $c = $mech->content; say "c is $c"; my $url2 = 'https://berniesanders.com/issues/racial-justice/'; my $mech2 = WWW::Mechanize->new; $mech2->get($url2); my $c2 = $mech2->content; say "c2 is $c2";

Let's look at some code, one of many tries:

#! /usr/bin/perl use warnings; use strict; use 5.010; use LWP::UserAgent; use HTML::Display; my $url = 'https://www.huntington.com/'; my $ua = LWP::UserAgent->new; $ua->agent( 'Windows Mozilla/5.0'); my $response = $ua->get($url); my $content = $response->content; $ENV{'PERL_HTML_DISPLAY_COMMAND'}='run "C:\Program Files (x86)\Googl +e\Chrome\Application\chrome.exe" %s'; my $browser=HTML::Display->new(); if (defined($browser)) { $browser->display(html=>$content); } else { print("Unable to open browser: $@\n"); }

This succeeds in loading the content from this site, but it is nearly unrecognizable, as the .css does not load, nor anything that depends on relative addressing. What's more, the login link, whilst active, does not function, so it's a complete washout. I've tried variations, but none of them seem to rise above the limitation that this browser believes it's running on my drive and doesn't live on a remote server, as this is a typical url:

file:///C:/cygwin64/tmp/9EQdRdu_5w.html

Might I have different luck with a different browser? I doubt it following this approach. My question is this: does perl have a browser that it can cook up from scratch, that is with the help of cpan?

Q2) Could I "harvest" the url from an automated session and use it as an argument as a black-box browser were instantiated?

Q3) Has anyone in the history of perl used it to create a means to watch netflix, hbogo or hulu?

Thanks for your comment.

Another Update

My apologies for another revision to the original post. I looked at the responses to another recent node that was somewhat related to mine and found the following resource, which has answered at least two of my questions directly: http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize/Examples.pod

#!/usr/bin/perl # turn on perl's safety features use strict; use warnings; # work out the name of the module we're looking for my $module_name = $ARGV[0] or die "Must specify module name on command line"; # create a new browser use WWW::Mechanize; my $browser = WWW::Mechanize->new(); # tell it to get the main page $browser->get("http://search.cpan.org/"); # okay, fill in the box with the name of the # module we want to look up $browser->form_number(1); $browser->field("query", $module_name); $browser->click(); # click on the link that matches the module name $browser->follow_link( text_regex => $module_name ); my $url = $browser->uri; # launch a browser... system('galeon', $url); exit(0);

Where I'm falling short now is how to rewrite the system command to be appropriate for windows 8. Note that the syntax I used for the HTML::Display program above indeed was able to instantiate chrome.exe, so I feel like I'm close. I wrote a little test program to try to find the right syntax:

#! /usr/bin/perl use warnings; use strict; use 5.010; my $url = 'https://www.youtube.com/watch?v=ju1IMxGSuNE'; system( 'run C:\Program Files (x86)\Google\Chrome\Application\chrome.e +xe', $url ); exit(0);

and got this response from my dos window:

The filename, directory name, or volume label syntax is incorrect.

Replies are listed 'Best First'.
Re: creating a useful browser from automation
by Corion (Patriarch) on Sep 05, 2015 at 07:44 UTC

    My recommendation is to avoid LWP::UserAgent for that use case and automate a browser instead. For example WWW::Mechanize::Firefox can automate Firefox and navigate you to a page following a given set of steps. For Internet Explorer there is Win32::IEAutomation. Also take a look at Selenium::Remote::Driver which maybe can automate other browsers.

      Selenium, being a testing tool that marshals the services of actual browser processes, can be made to fit this requirement.   Although ordinarily it shuts-down the browser instance upon completion of the tests, it does not have to.   So, all of the “hard part” of what you are trying to do here ... has already been done!   (Yay!)

Re: creating a useful browser from automation
by 1nickt (Canon) on Sep 05, 2015 at 04:59 UTC

    Hi datz_cozee75. You're in luck because Corion, the author of HTML::Display, is here almost every day. I've never used it, but I have used WWW::Mechanize, and both it and apparently HTML::Display allow you to set the base url from which all links in your content are relative.

    The below script works for me on my Macbook, launching chrome and displaying the page correctly I think:

    #!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; use HTML::Display; my $base_url = 'https://www.huntington.com/'; my $spider = WWW::Mechanize->new( autocheck => 1 ); $spider->get( $base_url ); die( "$base_url: " . $spider->response->status_line ) unless $spider-> +success; my $browser = HTML::Display->new(); $browser->display( html => $spider->content( base_href => $base_url ) +); __END__
    Hope this helps!

    The way forward always starts with a minimal test.
Re: creating a useful browser from automation
by 1nickt (Canon) on Sep 06, 2015 at 03:00 UTC

    I don't follow, why are you not using HMTL::Display with $ENV{'PERL_HTML_DISPLAY_COMMAND'}='run "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" %s'; as in your first example? Why the system call? That's an example from WWW::Mechanize for when you are not using HTML::Display.

    $ENV{'PERL_HTML_DISPLAY_COMMAND'}='run "C:\Program Files (x86)\Google\ +Chrome\Application\chrome.exe" %s'; my $browser = HTML::Display->new();

    What happened when you tried the script I posted?

    The way forward always starts with a minimal test.

      The reason I went with a system call is that it was in code I encountered after I made the original post. Also, I hadn't been able to make HTML::Display work with my meager experience. It turns out, that your script runs just fine on my Windows 8 machine:

      #!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; use HTML::Display; $ENV{'PERL_HTML_DISPLAY_COMMAND'}='run "C:\Program Files (x86)\Google\ +Chrome\Application\chrome.exe" %s'; my $base_url = 'https://www.huntington.com/'; my $spider = WWW::Mechanize->new( autocheck => 1 ); $spider->get( $base_url ); die( "$base_url: " . $spider->response->status_line ) unless $spider-> +success; my $browser = HTML::Display->new(); $browser->display( html => $spider->content( base_href => $base_url ) +); __END__

      So it is that I have to admit that I have another error from the original post, namely that HTML::Display seems to work fine as long as it's called correctly, in particular, with a base_url properly specified.

      I have to attend to the pesky invasion of real life, so I'll leave it with that.

Re: creating a useful browser from automation
by soonix (Chancellor) on Sep 07, 2015 at 14:36 UTC
    system( 'run C:\Program Files (x86)\Google\Chrome\Application\chrome.e +xe', ...
    And if you remove that "run"?

      Works fine! I picked up the run syntax from an old pm node and thought it was necessary. I'll just have to add that to the list of misconceptions I had in this thread...

      #!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; use HTML::Display; $ENV{'PERL_HTML_DISPLAY_COMMAND'}='"C:\Program Files (x86)\Google\Chro +me\Application\chrome.exe" %s'; my $base_url = 'https://www.huntington.com/'; my $spider = WWW::Mechanize->new( autocheck => 1 ); $spider->get( $base_url ); die( "$base_url: " . $spider->response->status_line ) unless $spider-> +success; my $browser = HTML::Display->new(); $browser->display( html => $spider->content( base_href => $base_url ) +); __END__