comment on

I'll cut to the chase: the same Perl code runs under Perl 5.8.0 (and Perl 5.8.5) a lot slower. What does a lot mean? Well, in this very case I'm presenting here, it means about five hundred times slower. Since I cannot believe that there exist recompiling options that can make Perl run 500 times slower/faster, one of the following must hold:

I'm using non-standard code (like attributes, prototypes etc.)
Perl 5.8.x is not backwardly compatible, at least when it comes to running the same code `about as fast as the very previous version does'

Since my code resembles Chapter 3 in a Perl textbook (regular expressions and IO reading/writing), the second one must be true: Perl 5.8.x is not backwardly compatible. I would expect that when it comes to obscure functionality or old deprecated functionality, but I wouldn't expect it when it comes to regular expressions. Regular expressions are the main reason why I chose Perl; if that breaks down, I might as well forget about Perl altogether and stick to Java and the ubiquitous Python (which is already the preferred choice over Perl in web development).

Out of decency towards the Perl community, I feel obliged to spend some time before jumping to conclusions, and examine my tests on three versions of Perl: 5.6.1, 5.8.0 (shipped with RedHat9), and 5.8.5, a very lite hand-made compilation, built for performance and no extra specialized functionality. However, I do not have the time nor the resources to conduct a test on another operating system. RedHat 9 is however a standard Linux operating system and this outrageous behavior is most probably common to many others, if not all.

The following tables outline debugging information obtained running perl -d:DProf and then dprofpp tmon.out.

Perl5.6.1

Total Elapsed Time = 0.080048 Seconds
  User+System Time = 0.080048 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 87.4   0.070  0.070     40   0.0018 0.0018  main::extract
 12.4   0.010  0.010      1   0.0100 0.0100  warnings::BEGIN
 0.00   0.000  0.010      2   0.0000 0.0050  main::BEGIN
 0.00   0.000  0.000      1   0.0000 0.0000  warnings::import
 0.00   0.000  0.000      1   0.0000 0.0000  strict::import
 0.00   0.000  0.000      1   0.0000 0.0000  strict::bits
 0.00   0.000  0.000      1   0.0000 0.0000  Exporter::import
 0.00   0.000  0.000      1   0.0000 0.0000  warnings::bits
[download]

Perl5.8.0

Total Elapsed Time = 123.5199 Seconds
  User+System Time = 39.62993 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 97.1   38.49 38.520     40   0.9622 0.9630  main::extract
 0.05   0.020  0.020      1   0.0200 0.0200  utf8::SWASHNEW
 0.03   0.010  0.010      1   0.0100 0.0100  utf8::AUTOLOAD
 0.00       - -0.000      1        -      -  utf8::SWASHGET
 0.00       - -0.000      1        -      -  Exporter::import
 0.00       - -0.000      1        -      -  warnings::unimport
 0.00       - -0.000      2        -      -  warnings::import
 0.00       - -0.000      1        -      -  warnings::BEGIN
 0.00       - -0.000      2        -      -  strict::unimport
 0.00       - -0.000      4        -      -  strict::bits
 0.00       - -0.000      2        -      -  strict::import
 0.00       - -0.000      3        -      -  main::BEGIN
 0.00       - -0.000      5        -      -  utf8::BEGIN
[download]

Perl5.8.5

%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 98.4   0.630  0.630     40   0.0157 0.0157  main::extract
 1.56   0.010  0.010      1   0.0100 0.0100  warnings::BEGIN
 0.00       - -0.000      1        -      -  warnings::import
 0.00       - -0.000      1        -      -  strict::import
 0.00       - -0.000      1        -      -  strict::bits
 0.00       -  0.010      2        - 0.0050  main::BEGIN
[download]

The main::extract subroutine takes about 9 times longer under Perl 5.8.5, and 549 times more under Perl 5.8.0, compared to Perl 5.6.1. The program itself took 1,543 times longer to finish under Perl 5.8.0 than it did under Perl 5.6.1. You may be wondering what the Perl program is:

use strict;
use warnings;

open (FILE, "a.txt");
my $text = "";
while (<FILE>) {
     $text .= $_;
}

close (FILE);

while (my ($one, $two) = extract ($text)) {
     $text = $one . $two;
}

sub extract {
     my ($text) = @_;

     if ($text =~ /(.*?)whatever(.*)/is) {
          return ($1, $2);
     }

     return ();
}
[download]

As you can see, this code slurps a file and removes all occurences of a certain word (`whatever'). If you're wondering why Perl 5.8.0 took 2 minutes, it's not because I was using a larger file, and it's not because the file was large. The size of the file was exactly 11,221 (about ten thousand) bytes.

When the /.*? regular expression is changed to /^.*? (an explicit version of the same regexp), and instead of a 10,000 byte file, a 5,000,000 byte file is used, here are the debugging results for the main::extract subroutine:

Perl 5.6.1

%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 88.1   0.670  0.670      1   0.6700 0.6700  main::extract
[download]

Perl 5.8.0

%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 95.0   2.490  2.510      1   2.4900 2.5100  main::extract
[download]

Perl 5.8.5

%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 19.5   0.080  0.080      1   0.0800 0.0800  main::extract
[download]

It's obvious that the little hat balanced out the differences between the three releases (although 3.7 times longer with Perl 5.8.0 is reason enough NOT to upgrade). Perl 5.8.5, in its current build was faster than Perl 5.6.1. The differences exist on account of different versions and different build parameters. To be more exact, here are the configuration summaries for the three releases:

Perl 5.6.1 Configuration Summary

    usethreads=undef use5005threads=undef useithreads=undef usemultipl
+icity=undef
    useperlio=undef d_sfio=undef uselargefiles=undef usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
[download]

Perl 5.8.0. Configuration Summary

    usethreads=define use5005threads=undef useithreads=define usemulti
+plicity=define
    useperlio= d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
[download]

Perl 5.8.5 Configuration Summary

    usethreads=undef use5005threads=undef useithreads=undef usemultipl
+icity=undef
    useperlio=undef d_sfio=undef uselargefiles=undef usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
[download]

The conclusion is that all regular expressions written like this:

$text =~ /(.*?)<whatever>/
[download]

take a thousand times more on 5.8.0. The same expressions written as

$text =~ /^(.*?)<whatever>/
[download]

which obviously means the same thing (look for the first occurence of <whatever> and save the text preceding it in the corresponding variables) has the same performance implications across these two versions.

In my honest opinion, This is not an issue of bad code and good code, this is an issue of good Perl and bad Perl. I've only discovered this strange behavior using standard regular expression and moving from 5.6 to 5.8, which are consecutive versions. If the changes are so dramatic when upgrading to the next version, what is one to expect of Perl in other respects?

I can tell you one thing: if IBM had written Perl, this would have never happened. Maybe there aren't enough alpha and beta testers, maybe developers don't have the time to write enough warning messages. What's certain is that Perl is not seen as a product, and the members of the community it attempts to serve are not being looked upon as customers. And that's the very difference between Open source and closed source software. What good is it's free, if it is deceiving its users about the problems it claims to solve?

In reply to Why does a Perl 5.6 regex run a lot slower on Perl 5.8? by perldeveloper

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.