Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a VPS 2GB RAM 2 CPU. On a local VM with 1GB RAM and 1 CORE the following script takes just under 2 seconds to run:
!/usr/bin/perl use strict; use warnings; use Mojo::DOM; use Mojo::UserAgent; use Data::Printer; my $ua = Mojo::UserAgent->new; my $url = 'http://www.testURL.com'; my $page = $ua->get( $url ) ->res ->dom; my $dom = Mojo::DOM->new( $page ); for my $deal ( $dom->find('p.title > a.title')->each ){ ProcessLink( $deal->attr('href'), $deal->text ); } sub ProcessLink{ warn "process link\n"; my ( $linkURL, $linkText ) = @_; print "Link: $linkURL | Title: $linkText"; }
The exact same code on the VPS takes 1m 57s to run. Networking from this VPS is fine, using top I see the CPU is maxed out, when Mojo::DOM-new is called. Is there anything I can do the make the run faster? A wget of the page from the shell is instant (VPS in a data center)

Replies are listed 'Best First'.
Re: Simple script goes from 2s to ~2mins when run on VPS
by Discipulus (Canon) on Mar 17, 2016 at 08:46 UTC
    Here we are in divination fields but i can share my little empiric experience: VPS and virtualization in general is intended to be economic (from the vendor point of view) not performant. The principle is optimizing dead times of machines and (hopefully) have peaks of usage not in concurrency. This means that a system intended to work constantly at high rate (as a big mailserver) is not a good candidate to be virtualized. More: i suspect that if CPU and RAM are well managed to accomplish these objective, not the same happens for the storage.

    I use a little Perl program to distribute files over a bunch of machines: when i first introduced a virtualized OS in the group i noticed, often, a sensible and perceivible slower response from the virtualized OS respect the other, old pieces of iron.

    Also empiraclly speaking, the storage is often the bottleneck of a VPS with hiccup like occurences.

    That said here my speculation; i just guess that the Mojo::UserAgent->new is loading a lot of things, perl classes ie files. Thinking about an heavy module I have yet installed i just tested perl -MMoose -MDevel::Symdump -E "my $obj = Devel::Symdump->rnew('Moose'); say for $obj->packages()" and i got a screenfull of calsses ie files it can load (from the filesystem).

    This can be the difference between the wget call and your Perl program.

    You can test if theese assumption are correct setting up a little program where Mojo::UserAgent->new is called before doing your time calculation ( Time::Hires can be handy or a simply use of Benchmark ). Then time the $ua->get( $url ) part against the brutal wget syscall, better several times in a row.

    If results are quite close and faster than your original program, then is the object creation the bottleneck. Interpreted languages have somehow their own level of virtualization, in a virtualized OS you add another layer of virtualization to the whole process. Generally speaking my opinion is that problems does'nt add, they multiply.

    If this is your case and you need frequent call to such code better you set up a little demon that sat quite in the memory with his Mojo::UserAgent->new yet created and feed him with your URLs.

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Simple script goes from 2s to ~2mins when run on VPS
by BrowserUk (Patriarch) on Mar 17, 2016 at 01:02 UTC

    Warning: I know nothing about Mojo; I've never used or worked with a VPS; and don't really have much of a clue about web work.

    My suggestion: If you are timing this from a client connecting to whatever this is; time the second instance, rather than the first.

    That is: if you are timing this remotely, make one connection from your browser (or other client); then make a second connection and time that.

    Rational. Virtual servers tend to be implemented as front-end stub that causes virtual machine instance to be loaded on demand. Thus, the first request to a VPS (within some time period) will require: a real server with 'space' to be allocated; the VM image to be located and loaded -- possibly from cold storage; that instance to be initialised; and then the request redirected to it.

    Whereas and second (and subsequent) request within the same time period, only needs to redirect to the already existing and running VM instance.

    That is the only reason I can imagine for a remote VM to be 50 times slower than a local VM.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Simple script goes from 2s to ~2mins when run on VPS (regexes)
by tye (Sage) on Mar 17, 2016 at 21:17 UTC

    The part of the code that you mention as being slow (and as using a lot of CPU) is the part that uses complex regexes to parse HTML. See Mojo::DOM::HTML's source for more details.

    Looking briefly at the regexes, I can imagine there being some versions of Perl where their performance is much worse than for other versions of Perl.

    But you don't even mention whether you are using the same versions of the Mojo modules between the two systems, so it could be that much different regexes are being used between the two.

    I don't consider this next item to be very likely to be the problem. But we recently discovered that some VM systems default to acting like they support every single optional CPU feature. This is probably done with the thinking of preventing some code from breaking because it requires some specific optional CPU feature. However, the more common case (in our experience) is that code only makes use of certain CPU features when they are available and works around the absence when not. And this leads to code that tries to make use of advanced, optional CPU features that the VM must emulate because the physical CPUs don't support it. Such emulation of a general feature can be orders of magnitude slower than the code's work around (for its specific use case). We have seen this lead to significant server performance problems.

    Simply configuring the VM to report support for only the CPU features that the physical CPUs actually support, made such performance problems disappear.

    - tye        

Re: Simple script goes from 2s to ~2mins when run on VPS
by perlfan (Parson) on Mar 17, 2016 at 00:31 UTC
    There are too many factors to consider. Who is your provider and what details are they giving you about your current tier?
Re: Simple script goes from 2s to ~2mins when run on VPS
by Anonymous Monk on Mar 16, 2016 at 21:40 UTC

    Is there anything I can do the make the run faster?

    Upgrade :) ensure equal versions :) then contact support