ray.rick.mini has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I'm moving a JS web scraper to a newly created VM. Code is based on perl mod. in title.. When I use my scraper on the VM, it goes very slowly.. On a physical machine, in 20 minutes it grabs 800 pages.. On the VM can grab 1 or 2 pages per minute..And also I got many crashes complaining about dead www::mech::firefox object. Can you help me to dig into this issue? I specify the VM is a perfect copy of the physical one.. thanks RAY
  • Comment on WWW:Mechanize::Firefox slow into Wmare/Virtualbox VMs

Replies are listed 'Best First'.
Re: WWW:Mechanize::Firefox slow into Wmare/Virtualbox VMs
by stevieb (Canon) on Jan 29, 2016 at 21:26 UTC
    Memory and CPU specs? That asked, usually this sort of thing in a VM is disk I/O related. I'd start there. This is after you ensure there are no network conflicts.

    In other words, this doesn't sound like a Perl issue at all.

      I second that. The from 40 to 2 per minute is quite the performance hit. So being a virtual machine on the same machine, it can not be CPU unless you have a 20 core machine and the virtual machine has only one core assigned.

      Can you fit in a second network card? Then bridge it? The concept is explained here:

      https://blogs.oracle.com/fatbloke/entry/networking_in_virtualbox1

      If you can not. There might be a faster network emulation. Read about virtio-net in (at the end there are also performance tips):

      https://www.virtualbox.org/manual/ch06.html

Re: WWW:Mechanize::Firefox slow into Wmare/Virtualbox VMs
by Corion (Patriarch) on Jan 30, 2016 at 08:16 UTC

    Are you sure that the issue is specific to WWW::Mechanize::Firefox at all? Does accessing the VM manually through Firefox work faster?

Re: WWW:Mechanize::Firefox slow into Wmare/Virtualbox VMs
by exilepanda (Friar) on Jan 30, 2016 at 16:14 UTC
    If you got an physical machine in 800pgs/min && your VM image is a perfect copy of your physical one, then the only difference seem here is your VM.

    Are you on Windows though? Windows limits you can only have a few live network connection at a time ( 4 as far as I can remember in XP decade, and 20 max if you have a win7 pro, after some reg hack ), thus you may experience broken result which caused those complains. Consider vm-to-windows consumed 1 channel, and windows-to-web consumed 1 channel, and so forth times 2 network loading( not precisely like that, but the concept is sort of like this). If a round trip is not done fast enough, the afterwards queries are all jammed.

    Maybe you can check your scraper how many threads, forks are run at a same time

Re: WWW:Mechanize::Firefox slow into Wmare/Virtualbox VMs
by ray.rick.mini (Sexton) on Jan 30, 2016 at 22:36 UTC

    Hello Monks, and thank you for feedbacks. Luckily, I managed to solve my performance issues, I had to add a couple of VMW parameters, I also updated guest OS and open-vm-tools to last unstable release. Now scraper is doing well, not as fast as phyisical, but about 50% that is acceptable for me. I have a question: I managed to see many error coming from this piece of code..

    sub wait_until_xpath(){ my $xpath=shift; my $retries = 90; while ($retries-- and ! $mech->is_visible( xpath => $xpath )) { print "wax waited..\n"; usleep(50*$msec); }; warn "wait_until_xpath(): timeout at retry #$retries" if 0 > $retr +ies; }

    I now changed it inside an eval block, and it magically fixed it..It works but I don t know why! :). Please comment

    sub wait_until_xpath(){ eval{ my $xpath=shift; my $retries = 90; while ($retries-- and ! $mech->is_visible( xpath => $xpath )) { print "wax waited..\n"; usleep(50*$msec); }; warn "wait_until_xpath(): timeout at retry #$retries" if 0 > $retr +ies; } }

      eval is a way of hiding errors. You are now not seeing the error anymore.

        thanks Corion, this module is awesome and very useful to me!
Re: WWW:Mechanize::Firefox slow into Wmare/Virtualbox VMs
by Anonymous Monk on Jan 29, 2016 at 20:29 UTC

    I specify the VM is a perfect copy of the physical one..

    What?

      What?