I'll cut to the chase: the same Perl code runs under Perl 5.8.0 (and Perl 5.8.5) a lot
slower. What does a lot mean? Well, in this very case I'm presenting here, it means
about five hundred times slower. Since I cannot believe that there exist recompiling options
that can make Perl run 500 times slower/faster, one of the following must hold:
Since my code resembles Chapter 3 in a Perl textbook (regular expressions and IO
reading/writing), the second one must be true:
I would expect that when it comes to obscure functionality or old deprecated functionality,
but I wouldn't expect it when it comes to regular expressions. Regular expressions are
the main reason why I chose Perl; if that breaks down, I might as well forget about Perl
altogether and stick to Java and the ubiquitous Python (which is already the preferred choice
over Perl in web development).
Out of decency towards the Perl community, I feel obliged to spend some time before
jumping to conclusions, and examine my tests on three versions of Perl: 5.6.1, 5.8.0
(shipped with RedHat9), and 5.8.5, a very lite hand-made compilation, built
for performance and no extra specialized functionality. However, I do not have the time
nor the resources to conduct a test on another operating system. RedHat 9 is however a
standard Linux operating system and this outrageous behavior is most probably common to
many others, if not all.
The following tables outline debugging information obtained running
perl -d:DProf and then
dprofpp tmon.out.
Perl5.6.1
Total Elapsed Time = 0.080048 Seconds
User+System Time = 0.080048 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
87.4 0.070 0.070 40 0.0018 0.0018 main::extract
12.4 0.010 0.010 1 0.0100 0.0100 warnings::BEGIN
0.00 0.000 0.010 2 0.0000 0.0050 main::BEGIN
0.00 0.000 0.000 1 0.0000 0.0000 warnings::import
0.00 0.000 0.000 1 0.0000 0.0000 strict::import
0.00 0.000 0.000 1 0.0000 0.0000 strict::bits
0.00 0.000 0.000 1 0.0000 0.0000 Exporter::import
0.00 0.000 0.000 1 0.0000 0.0000 warnings::bits
Perl5.8.0
Total Elapsed Time = 123.5199 Seconds
User+System Time = 39.62993 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
97.1 38.49 38.520 40 0.9622 0.9630 main::extract
0.05 0.020 0.020 1 0.0200 0.0200 utf8::SWASHNEW
0.03 0.010 0.010 1 0.0100 0.0100 utf8::AUTOLOAD
0.00 - -0.000 1 - - utf8::SWASHGET
0.00 - -0.000 1 - - Exporter::import
0.00 - -0.000 1 - - warnings::unimport
0.00 - -0.000 2 - - warnings::import
0.00 - -0.000 1 - - warnings::BEGIN
0.00 - -0.000 2 - - strict::unimport
0.00 - -0.000 4 - - strict::bits
0.00 - -0.000 2 - - strict::import
0.00 - -0.000 3 - - main::BEGIN
0.00 - -0.000 5 - - utf8::BEGIN
Perl5.8.5
%Time ExclSec CumulS #Calls sec/call Csec/c Name
98.4 0.630 0.630 40 0.0157 0.0157 main::extract
1.56 0.010 0.010 1 0.0100 0.0100 warnings::BEGIN
0.00 - -0.000 1 - - warnings::import
0.00 - -0.000 1 - - strict::import
0.00 - -0.000 1 - - strict::bits
0.00 - 0.010 2 - 0.0050 main::BEGIN
The
main::extract subroutine takes about 9 times longer under Perl 5.8.5, and 549
times more under Perl 5.8.0, compared to Perl 5.6.1. The program itself took 1,543 times longer to finish under Perl 5.8.0 than it did under Perl 5.6.1. You may be wondering what the Perl
program is:
use strict;
use warnings;
open (FILE, "a.txt");
my $text = "";
while (<FILE>) {
$text .= $_;
}
close (FILE);
while (my ($one, $two) = extract ($text)) {
$text = $one . $two;
}
sub extract {
my ($text) = @_;
if ($text =~ /(.*?)whatever(.*)/is) {
return ($1, $2);
}
return ();
}
As you can see, this code slurps a file and removes all occurences of a certain word
(`whatever'). If you're wondering why Perl 5.8.0 took 2 minutes, it's not because I was using
a larger file, and it's not because the file was large. The size of the file was exactly
11,221 (about ten thousand) bytes.
When the
/.*? regular expression is changed to
/^.*? (an explicit
version of the same regexp), and instead of a 10,000 byte file, a 5,000,000 byte file is used,
here are the debugging results for the main::extract subroutine:
Perl 5.6.1
%Time ExclSec CumulS #Calls sec/call Csec/c Name
88.1 0.670 0.670 1 0.6700 0.6700 main::extract
Perl 5.8.0
%Time ExclSec CumulS #Calls sec/call Csec/c Name
95.0 2.490 2.510 1 2.4900 2.5100 main::extract
Perl 5.8.5
%Time ExclSec CumulS #Calls sec/call Csec/c Name
19.5 0.080 0.080 1 0.0800 0.0800 main::extract
It's obvious that the little hat balanced out the differences between the three releases (although 3.7 times longer with Perl 5.8.0 is reason enough NOT to upgrade). Perl 5.8.5, in its current build was faster than Perl 5.6.1. The differences exist on account of different versions and different build parameters. To
be more exact, here are the configuration summaries for the three releases:
Perl 5.6.1 Configuration Summary
usethreads=undef use5005threads=undef useithreads=undef usemultipl
+icity=undef
useperlio=undef d_sfio=undef uselargefiles=undef usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
Perl 5.8.0. Configuration Summary
usethreads=define use5005threads=undef useithreads=define usemulti
+plicity=define
useperlio= d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
Perl 5.8.5 Configuration Summary
usethreads=undef use5005threads=undef useithreads=undef usemultipl
+icity=undef
useperlio=undef d_sfio=undef uselargefiles=undef usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
The conclusion is that all regular expressions written like this:
$text =~ /(.*?)<whatever>/
take a thousand times more on 5.8.0. The same expressions written as
$text =~ /^(.*?)<whatever>/
which obviously means the same thing (look for the first occurence of
<whatever>
and save the text preceding it in the corresponding variables) has the same performance
implications across these two versions.
In my honest opinion,
This is not an issue of bad code and good code, this is an issue of good Perl and bad Perl.
I've only discovered this strange behavior using standard regular expression and moving
from 5.6 to 5.8, which are consecutive versions. If the changes are so dramatic when
upgrading to the next version, what is one to expect of Perl in other respects?
I can tell you one thing: if IBM had written Perl, this would have never happened. Maybe
there aren't enough alpha and beta testers, maybe developers don't have the time to write
enough warning messages. What's certain is that Perl is not seen as a product, and the
members of the community it attempts to serve are not being looked upon as customers. And
that's the very difference between Open source and closed source software. What good is
it's free, if it is deceiving its users about the problems it claims to solve?