http://qs1969.pair.com?node_id=145790

After posting RFC CGI.pm refactoring a number of monks raised a very valid point - "Why bother?". CGI.pm is stable and its use widespread. Why bother changing from this? One 'good' reason might be speed. As a number of monks have requested speed comparison benchmarks with CGI.pm here are a range of them. Executive summary CGI::Simple is about twice as fast as CGI.pm depending on what you are talking about.

To use a module you need to load it, make an object, then extract stuff from that object. Depending on whether you are using mod perl or not you may or may not need a new process. Here are some tests:

Module Loading

To do the load test we need to trick perl into reloading the modules over and over which we do by by undefing %INC and requiring them in:

my ($start, $end, $cgitime, $simpletime); my $n = 100; $start = time; do{require CGI; undef %INC} for 1..$n; $end = time; $cgitime = $end - $start; print "Loading CGI $n times takes $cgitime seconds\n"; $start = time; do{require CGI::Simple; undef %INC} for 1..$n; $end = time; $simpletime = $end - $start; print "Loading CGI::Simple $n times takes $simpletime seconds\n"; __DATA__ # Standard distro Loading CGI 100 times takes 40 seconds Loading CGI::Simple 100 times takes 29 seconds # With use strict commented out (CGI.pm does not use strict) Loading CGI 100 times takes 39 seconds Loading CGI::Simple 100 times takes 23 seconds

As you can see CGI::Simple will load about 38% faster in standard form. To compare apples with apples I commented out the use strict; in CGI::Simple to remove the overhead of loading strict.pm which makes CGI::Simple load 79% faster than CGI.pm. While use strict; is excellent (and highly recommended) for development it does carry a penalty at compile time.

Extracting data from a CGI object

Now lets have a look at how fast we can extract data from our CGI object:

use Benchmark; use CGI qw/:cgi /; use CGI::Simple; $ENV{'QUERY_STRING'} = 'foo=bar&baz=boo'; timethese(10000, {'CGI' => '$q = new CGI; $q->param("baz")', 'Simple' => '$s = new CGI::Simple; $s->param("baz")'}); timethese(10000, {'CGI' => '$q = new CGI; $q->param("baz") for 1..10', 'Simple' => '$s = new CGI::Simple; $s->param("baz") for 1..1 +0'}); __DATA__ Benchmark: timing 10000 iterations of CGI, Simple... CGI: 22 wallclock secs (21.70 usr + 0.00 sys = 21.70 CPU) @ 46 +0.83/s (n=10000) Simple: 15 wallclock secs (15.05 usr + 0.00 sys = 15.05 CPU) @ 66 +4.45/s (n=10000) Benchmark: timing 10000 iterations of CGI, Simple... CGI: 32 wallclock secs (31.20 usr + 0.00 sys = 31.20 CPU) @ 32 +0.51/s (n=10000) Simple: 18 wallclock secs (18.57 usr + 0.00 sys = 18.57 CPU) @ 53 +8.50/s (n=10000)

As you can see CGI::Simple is 43% faster making a new object and getting one param or 68% faster making a new object and getting 10 params.

Module loading and data extraction

In practical terms the module load time is often a choke point (depending on the application). Here is a test that tests module loading, object creation and parameter parsing together (note we don't completely undef %INC this time as we need to keep Benchmark.pm in %INC):

use Benchmark; $ENV{'QUERY_STRING'} = 'foo=bar&baz=boo'; $cgi_code = <<'CODE'; %INC = ('Benchmark.pm' => 'C:/Perl/lib/Benchmark.pm'); require CGI; $q = new CGI; $q->param("baz") for 1..10; CODE $simp_code = <<'CODE'; %INC = ('Benchmark.pm' => 'C:/Perl/lib/Benchmark.pm'); require CGI::Simple; $q = new CGI::Simple; $q->param("baz") for 1..10; CODE timethese(100, { 'CGI' => $cgi_code, 'Simple' => $simp_code }); __DATA__ Benchmark: timing 100 iterations of CGI, Simple... CGI: 42 wallclock secs (43.50 usr + 0.00 sys = 43.50 CPU) @ 2 +.30/s (n=100) Simple: 18 wallclock secs (18.56 usr + 0.00 sys = 18.56 CPU) @ 5 +.39/s (n=100)

Unless you are using mod perl this testing is still inadequate as the startup time for a new process is not measured. This forms a significant part of serving a CGI request. Nonetheless there is a raw 134% performance improvement for using CGI::Simple over CGI.

New process creation testing

So, finally here is some data on the whole shebang: new process, load the module, make a new object and get some param data out:

C:\>type cgi.pl $ENV{'QUERY_STRING'} = 'foo=bar&baz=boo'; use CGI; $q = new CGI; $q->param("baz") for 1..10; C:\>type cgi-simple.pl $ENV{'QUERY_STRING'} = 'foo=bar&baz=boo'; use CGI::Simple; $q = new CGI::Simple; $q->param("baz") for 1..10; C:\>type test.pl my $start; my $n = 100; $start = time; `perl c:\\cgi.pl` for 1..$n; print "$n iterations using CGI takes ", time-$start, " seconds\n"; $start = time; `perl c:\\cgi-simple.pl` for 1..$n; print "$n iterations using CGI::Simple takes ", time-$start, " seconds +\n"; C:\>perl test.pl 100 iterations using CGI takes 73 seconds 100 iterations using CGI::Simple takes 40 seconds C:\>

So testing the whole shebang CGI::Simple is just over 80% faster performing the same task.

However you want to look at this data it equates to being able to handle a lot more requests on the same server. To change from CGI to CGI::Simple is a one line change as the interface is identical.... Is it worthwhile considering? For me yes, for you who knows?

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Replies are listed 'Best First'.
Re: CGI::Simple vs CGI.pm - Is twice as fast good enough?
by vladb (Vicar) on Feb 16, 2002 at 02:13 UTC
    I think, yes, denying the importance of using slimmer and faster variants of certain common tools/modules is not the wisest thing to do. For one, I already see quite a number of CGI scripts that which require performance boost and one way I might do it is by simply converting from 'use CGI' to 'use CGI::Simple;'. This will be especially easy to do since most of my CGI scripts don't require extended features supplied with standard CGI module.

    However, as far as 'extended' features go, is it not true that CGI doesn't really load them until they are first requested inside the main code? Basically, CGI keeps this %SUBS hash which contains a whole bunch of subroutines' definitions. These are loaded only on the first time each one is requested. I feel like author(s) of the modules eagerly tried to drive this point across with this comment (ripped from CGI.pm):
    ###################################################################### +######### ################# THESE FUNCTIONS ARE AUTOLOADED ON DEMAND ########### +######### ###################################################################### +#########

    This is followed by the infamous %SUBS hash:
    %SUBS = ( 'read_from_client' => <<'END_OF_FUNC', # Read data from a file handle sub read_from_client { my($self, $fh, $buff, $len, $offset) = @_; local $^W=0; # prevent a warning return undef unless defined($fh); return read($fh, $$buff, $len, $offset); } END_OF_FUNC ### MANY OTHER EXCITING SUBS ### );
    So, say, even if I went the 'use CGI;' way, the only time wasted here (provided I have no interest in making a call to the CGI::read_from_client() method) is that required for the hash to load. Perl parser wouldn't waste a nanosecond on parsing the actual sub. This is a huge time saver compared to if subs were not nicely hidden inside a hash etc. (the standard way).

    I'm wondering if this would explain the fact that CGI::Simple is only 50% faster than CGI?

    "There is no system but GNU, and Linux is one of its kernels." -- Confession of Faith

      When you use CGI.pm 700 lines of code get 'compiled' plus 2400 lines of code also get put in the subs hash. While the code in the subs hash is not compiled a fairly large hash must be generated which takes both memory and time. CGI::Simple uses SelfLoader to avoid compiling methods that are rarely used. You do this by placing these methods below a __DATA__ token. At compile time compilation stops at the __DATA__ token. As a result when you use CGI::Simple only 300 lines of code actually get compiled.

      With SelfLoader if you call one of the methods below the data token then (from the docs):

      The SelfLoader will read from the CGI::Simple::DATA filehandle to load in the data after __DATA__, and load in any subroutine when it is called. The costs are the one-time parsing of the data after __DATA__, and a load delay for the _first_ call of any autoloaded function. The benefits (hopefully) are a speeded up compilation phase, with no need to load functions which are never used.

      One of the neat things about SelfLoader is that if you know that you will regularly use methods x, y, and z you can easily tune the module by placing these above the data token. As a result they will be available without using SelfLoader and the runtime overhead of using SelfLoader need never be paid.

      One of the not so neat things is that you have to load SelfLoader to use it, so there is a compile time penalty that you must pay. Fortunately SelfLoader.pm is only 100 lines of code. I was sorely tempted to 'roll my own' as you can do this with much less code provided you do not have to 'cover all the bases' as the module does. This, however, went against the perl concept of using modular code when available. Similarly CGI::Simple uses IO::File->new_tmpfile() to generate a self destructing temp file for file uploads leaving all the details to this module. IO::File is called via a require so you only load it if and when you need it.

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

        tachyon wrote:

        CGI::Simple uses SelfLoader to avoid compiling methods that are rarely used. You do this by placing these methods below a __DATA__ token. At compile time compilation stops at the __DATA__ token.

        I didn't notice this part at first. mod_perl scripts cannot contain __DATA__ tokens. Do you have a solution for this? I suppose you can make a separate mod_perl implmentation without the __DATA__ token. Since the performance issue you're resolving is load time, this really doesn't apply in this instance. However, then you have CGI::Simple, CGI::Simple::Standard, and CGI::Simple::mod_perl. I don't see a problem with that if you really need those namespaces to address these issues, but I wonder if others would object.

        Cheers,
        Ovid

        Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: CGI::Simple vs CGI.pm - Is twice as fast good enough?
by Dog and Pony (Priest) on Feb 16, 2002 at 12:27 UTC
    However you want to look at this data it equates to being able to handle a lot more requests on the same server. To change from CGI to CGI::Simple is a one line change as the interface is identical.... Is it worthwhile considering? For me yes, for you who knows?

    I think it is very much worth considering, if you (like me) usually use only the methods also in CGI::Simple. It can mean that your site is put to better use, more happy users due to speed-up, and in theory can save costs due to not needing to upgrade. Well, no I don't really believe that to be true on account on just this change, but who knows? This, together with other optimizations can very well amount to this.

    I also believe in doing things faster and better when it is worth it. Changing just one line in each script (a job for perl?) and gain that sounds very good to me. And I just can't see how it would hurt (unless your module is broken, of course).

    Or maybe it is just my old days as a C=64 hacker that keeps on liking any optimization that can be done. Those were the days... :) Although I still withstand that some of the same thinking is good, and needful, for succesful web programming. Anyone who has visited (or failed to) a bogged down site should agree.

      unless your module is broken, of course

      This is of course a grave concern. I am fairly confident that it is not too broken and somewhat reassured that it runs the CGI.pm test suite as well as a large number of unit tests. I wrote unit tests (well actually I wrote a perl script that wrote stubs for most of them) for every function in void, scalar and array context so the test suite is extensive.

      I have checked the install under Win95, 98 and NT and Linux Redhat and Mandrake and FreeBSD and it installs fine.

      Despite this any QA specialist will tell you there is always another bug, indeed Ovid has already found a blatant one (under mod_perl). I am hopeful that some of the monks who run development servers may consider the speed benefit worthwhile to experiment with it. It is only through widespread testing that you can truly start to feel confident that a piece of software is really stable.

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print