Discipulus has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks!

I read on the documentation of WWW::Mechanize::Chrome that this is the support forum of the module... grin.. grin..

First of all thanks for this wonderful module! What I'm trying, having success, is the automation of a very complex page full of js and other amenities, but following questions are also applicable to other websites I suppose. So here the questions (my setup at the end of the post):

1) this is really silly: I see a banner in the browser telling me that the software is controlled by automation software. Is this perceivible by websites part? I mean: something in the useragent will tell is a automation bot? or examining the access log?

2) even with incognito => 0 in the constructor, the browser (both chrome and chromium) opens always a normal window and an incognito one, the second one is automated.

I didnt find much about incognito mode on the web nor grepping the distribution

3) tabs: I understood that tab => "Title of The Tab" in the constructor is meant to connect to a previously open tab. But how if every instance of chrome/chromium has to be shutdown before mechanize it? See here too

Short answers are welcome too. I mean I fear to not understand too much under the hood details ;)

My setup:

windows 7 strawberry perl portable 5.24.1 PATH=C:\ulisse\perl5.24-64b\perl\site\bin;C:\ulisse\perl5.24-64b\perl\ +bin;C:\ulisse\perl5.24-64b\c\bin;C:\ulisse\bin\UnxUtils\usr\local\wbi +n;C:\Windows;C:\Windows\system32; chrome version 81.0.4044.129 (Build ufficiale) (a 64 bit) chromium Versione 84.0.4135.0 (Build) (a 64 bit) perl -MWWW::Mechanize::Chrome -e "print $WWW::Mechanize::Chrome::VERSI +ON" 0.48 perl -MIO::Async -e "print $IO::Async::VERSION" 0.75 # fails test (forced installation) see my report at http://www.cpantes +ters.org/cpan/report/7a39b6be-6c02-1014-bdb8-3bd4031ef651

Thanks!

L*

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Replies are listed 'Best First'.
Re: some doubts on my first steps with WWW::Mechanize::Chrome
by Corion (Patriarch) on May 04, 2020 at 06:16 UTC

    Thank you for using my module and giving feedback! It's always great to find people actually using your software!

    The banner telling that the software is automated is (to my knowledge) not directly detectable by the website. If the website runs Javascript, I think it can detect that the page area is smaller than it "should" be, but other than that, there is no way to detect that there is nobody sitting in front of the browser.

    I think the non-incognito mode is somewhat broken if you launch your own window. I'll have to add more tests to that. Currently I don't have a use case for this, so this has fallen a bit into neglect. I'm also not sure how I can test this behaviour well.

    For reusing a tab, you will need to have the "main" Chrome instance started with --remote-debugging-port=9222. This will allow Perl to access your main Chrome instance. This is not really great, but I didn't find a better way to let Perl shoulder-surf.

      Thanks Corion,

      > I think the non-incognito mode is somewhat broken if you launch your own window.

      i miss the sense of this: I launch manually or I launch programatically ? This has to be combined with the following answer? I have to launch a new instance of chromium/chrome alone (as the only instance of the program) with --remote-debugging-port=9222 and then I can take the control of an already open tab (open manually) and I can also have non incognito mode off?

      L*

      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

        You have to manually launch "your" (the first) Chrome instance with --remote-debugging-port=922.

        Afterwards, if you launch a WWW::Mechanize::Chrome program, you tell it to connect to an existing tab and it should connect to the already running Chrome instance which has your cookies etc.

      Hi,

      "other than that, there is no way to detect that there is nobody sitting in front of the browser"

      As I mentioned in CB to Discipulus, my experience is that sadly, websites go to great and effective lengths to detect that very thing. Some specific things I have encountered that forced me to use Selenium::Remote::Driver to drive the browser:

      • Filling in a form field too fast (ie simply pasting/setting the value. Needed to add delay between keystrokes)
      • An invisible modal overlaid above the content (the form field is still in the DOM, but a human user has to click once to clear the modal and then into the form field)
      • Filling in a form field that is deliberately located "below the fold" (i.e. off screen, necessitating a human user to scroll the window)

      Hope this helps!


      The way forward always starts with a minimal test.

        and I was wondering what strategies they may use for tele-examining the studgents.

Re: some doubts on my first steps with WWW::Mechanize::Chrome -- further steps and doubts
by Discipulus (Canon) on May 20, 2020 at 17:38 UTC
    Thanks Corion and all,

    I still have some, stupid, question about the web in general and some issue with, anyway woderful, WWW::Mechanize::Chrome module.

    Stupid question: if I use $mech->eval_in_page ( qq(  xajax_viewWindow(container.open({saveName: 'profiles', title: ''}) ) )); with an empty title while the orginal code of the page was title: 'TITLE' this action is just local to my browser or can be catched on the server side spotting the missing title? Sorry for my limited understanding of the whole web mechanisms.

    More on WMC: in addition to the incognito param (the author will fix it soon ;) I also have problem with the autoclose one. The following not so short example will demonstrate it. In brief if the browser was started by the perl program autoclose => 0 has no effect. This happens with both chrome and chromium. By other hand if the browser was started before the perl program with --remote-debugging-port=9222 as arg then autoclose => 0 will work.

    The opposite setting, autoclose => 1 will work as expected closing the browser at the end of the perl program if the browser was started before perl and also if launched by perl.

    use strict; use warnings; use Log::Log4perl qw(:easy); use WWW::Mechanize::Chrome; Log::Log4perl->easy_init($ERROR); # Set priority of root logger to ER +ROR my $mech; my( $chrome, $diagnosis ) = WWW::Mechanize::Chrome->find_executable(); print "error: [$diagnosis]\n" if $diagnosis; print "Should I use ",( $chrome ? "[$chrome]" : "-NOT FOUND-" ), " or +another executable (put full path or leave blank for default)\n"; my $chrome_path = <STDIN>; chomp $chrome_path; $chrome_path = $chrome_path ? $chrome_path : $chrome; print "I have to use an existing chrome tab ( for example PerlMonks )? + leave it blank to open a new browser instance\n"; my $tab_title = <STDIN>; chomp $tab_title; if ( $tab_title ){ $mech = WWW::Mechanize::Chrome->new( autoclose => 0, # has no effect autodie => 0, incognito => 0, # has no effect tab => qr/$tab_title/, ); } else{ print "Give me the full url to connect ( for example https://www.p +erlmonks.org )\n"; my $url = <STDIN>; chomp $url; $mech = WWW::Mechanize::Chrome->new( autoclose => 0, # has no effect autodie => 0, incognito => 0, # has no effect launch_exe => $chrome_path, launch_arg => [ "--remote-debugging-port=9222" ] ); $mech->get( $url ); } sleep 5; # go from Monastery Gates to Newest Nodes $mech->click ( { selector => '#titlebar-top tbody tr td.monktitleb +ar ul li:nth-child(15) a'} ); print "Press ENTER to continue... (if the browser was open by $0 will +be closed despite of autoclose => 0)"; my $ready = <STDIN>;

    Thanks in advance

    L*

    PS I'd also like to know if the current running chrome instance was started with the correct parameter --remote-debugging-port=9222 and, shooting in the dark I tried $mech->target() but I'm a bit confused: if i use use Data::Dump; dd $mech->target(); it dies with:

    Cannot use IO::Async::Stream::Writer as an ARRAY reference at C:/uliss +e/perl5.24-64b/perl/vendor/lib/Data/Dump.pm line 2 75.

    and now I rembember i forced installation of IO::Async with no testing..

    With good old Data::Dumper I get a whole mess of lines containing 'port' => 9222,'is_connected' => 1 that sounds promising, but sincerely I dunno how to access it: after the first level becomes difficult to parse:

    print $_.$/ for keys %{ $mech->target() }; sessionId listener have_target_info sequence_number json receivers transport tab _one_shot targetId on_message browserContextId

    PPS

    after some brute forcing I got something useful:

    print "\n\nPORT: ", ${$mech->target()}{transport}{port},"\n", "IS CONNECTED: ",${$mech->target()}{transport}{is_connected},"\n", "LISTENER: ",${$mech->target()}{transport}{listener},"\n", "TAB: ",${$mech->target()}{transport}{tab},"\n"; # gives PORT: 9222 IS CONNECTED: 1 LISTENER: HASH(0x498a420) TAB: HASH(0x4bfbe78)

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

      Thank you very much for the report and the test program! This helps me immensively to reproduce the problem.

      The autoclose issue is a bug. The fix for it is simple - the shutdown is always killing Chrome instead of checking whether it should try to do that:

      sub close { ... if( $_[0]->{autoclose} ) { $_[0]->kill_child( $_[0]->{cleanup_signal}, $pid, $_[0]->{wait +_file} ); } }

      But now I have to think about what my test suite tests and how I can test this case better... Then I will release a fixed version :)

      Thanks again - this should now be fixed with 0.56.