I'll cut to the chase: the same Perl code runs under Perl 5.8.0 (and Perl 5.8.5) a lot slower. What does a lot mean? Well, in this very case I'm presenting here, it means about five hundred times slower. Since I cannot believe that there exist recompiling options that can make Perl run 500 times slower/faster, one of the following must hold:
Since my code resembles Chapter 3 in a Perl textbook (regular expressions and IO reading/writing), the second one must be true: Perl 5.8.x is not backwardly compatible. I would expect that when it comes to obscure functionality or old deprecated functionality, but I wouldn't expect it when it comes to regular expressions. Regular expressions are the main reason why I chose Perl; if that breaks down, I might as well forget about Perl altogether and stick to Java and the ubiquitous Python (which is already the preferred choice over Perl in web development).

Out of decency towards the Perl community, I feel obliged to spend some time before jumping to conclusions, and examine my tests on three versions of Perl: 5.6.1, 5.8.0 (shipped with RedHat9), and 5.8.5, a very lite hand-made compilation, built for performance and no extra specialized functionality. However, I do not have the time nor the resources to conduct a test on another operating system. RedHat 9 is however a standard Linux operating system and this outrageous behavior is most probably common to many others, if not all.

The following tables outline debugging information obtained running perl -d:DProf and then dprofpp tmon.out.

Perl5.6.1
Total Elapsed Time = 0.080048 Seconds User+System Time = 0.080048 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 87.4 0.070 0.070 40 0.0018 0.0018 main::extract 12.4 0.010 0.010 1 0.0100 0.0100 warnings::BEGIN 0.00 0.000 0.010 2 0.0000 0.0050 main::BEGIN 0.00 0.000 0.000 1 0.0000 0.0000 warnings::import 0.00 0.000 0.000 1 0.0000 0.0000 strict::import 0.00 0.000 0.000 1 0.0000 0.0000 strict::bits 0.00 0.000 0.000 1 0.0000 0.0000 Exporter::import 0.00 0.000 0.000 1 0.0000 0.0000 warnings::bits
Perl5.8.0
Total Elapsed Time = 123.5199 Seconds User+System Time = 39.62993 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 97.1 38.49 38.520 40 0.9622 0.9630 main::extract 0.05 0.020 0.020 1 0.0200 0.0200 utf8::SWASHNEW 0.03 0.010 0.010 1 0.0100 0.0100 utf8::AUTOLOAD 0.00 - -0.000 1 - - utf8::SWASHGET 0.00 - -0.000 1 - - Exporter::import 0.00 - -0.000 1 - - warnings::unimport 0.00 - -0.000 2 - - warnings::import 0.00 - -0.000 1 - - warnings::BEGIN 0.00 - -0.000 2 - - strict::unimport 0.00 - -0.000 4 - - strict::bits 0.00 - -0.000 2 - - strict::import 0.00 - -0.000 3 - - main::BEGIN 0.00 - -0.000 5 - - utf8::BEGIN
Perl5.8.5
%Time ExclSec CumulS #Calls sec/call Csec/c Name 98.4 0.630 0.630 40 0.0157 0.0157 main::extract 1.56 0.010 0.010 1 0.0100 0.0100 warnings::BEGIN 0.00 - -0.000 1 - - warnings::import 0.00 - -0.000 1 - - strict::import 0.00 - -0.000 1 - - strict::bits 0.00 - 0.010 2 - 0.0050 main::BEGIN
The main::extract subroutine takes about 9 times longer under Perl 5.8.5, and 549 times more under Perl 5.8.0, compared to Perl 5.6.1. The program itself took 1,543 times longer to finish under Perl 5.8.0 than it did under Perl 5.6.1. You may be wondering what the Perl program is:
use strict; use warnings; open (FILE, "a.txt"); my $text = ""; while (<FILE>) { $text .= $_; } close (FILE); while (my ($one, $two) = extract ($text)) { $text = $one . $two; } sub extract { my ($text) = @_; if ($text =~ /(.*?)whatever(.*)/is) { return ($1, $2); } return (); }
As you can see, this code slurps a file and removes all occurences of a certain word (`whatever'). If you're wondering why Perl 5.8.0 took 2 minutes, it's not because I was using a larger file, and it's not because the file was large. The size of the file was exactly 11,221 (about ten thousand) bytes.

When the /.*? regular expression is changed to /^.*? (an explicit version of the same regexp), and instead of a 10,000 byte file, a 5,000,000 byte file is used, here are the debugging results for the main::extract subroutine:

Perl 5.6.1
%Time ExclSec CumulS #Calls sec/call Csec/c Name 88.1 0.670 0.670 1 0.6700 0.6700 main::extract
Perl 5.8.0
%Time ExclSec CumulS #Calls sec/call Csec/c Name 95.0 2.490 2.510 1 2.4900 2.5100 main::extract
Perl 5.8.5
%Time ExclSec CumulS #Calls sec/call Csec/c Name 19.5 0.080 0.080 1 0.0800 0.0800 main::extract
It's obvious that the little hat balanced out the differences between the three releases (although 3.7 times longer with Perl 5.8.0 is reason enough NOT to upgrade). Perl 5.8.5, in its current build was faster than Perl 5.6.1. The differences exist on account of different versions and different build parameters. To be more exact, here are the configuration summaries for the three releases:

Perl 5.6.1 Configuration Summary
usethreads=undef use5005threads=undef useithreads=undef usemultipl +icity=undef useperlio=undef d_sfio=undef uselargefiles=undef usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef
Perl 5.8.0. Configuration Summary
usethreads=define use5005threads=undef useithreads=define usemulti +plicity=define useperlio= d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef
Perl 5.8.5 Configuration Summary
usethreads=undef use5005threads=undef useithreads=undef usemultipl +icity=undef useperlio=undef d_sfio=undef uselargefiles=undef usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef
The conclusion is that all regular expressions written like this:
$text =~ /(.*?)<whatever>/
take a thousand times more on 5.8.0. The same expressions written as
$text =~ /^(.*?)<whatever>/
which obviously means the same thing (look for the first occurence of <whatever> and save the text preceding it in the corresponding variables) has the same performance implications across these two versions.

In my honest opinion, This is not an issue of bad code and good code, this is an issue of good Perl and bad Perl. I've only discovered this strange behavior using standard regular expression and moving from 5.6 to 5.8, which are consecutive versions. If the changes are so dramatic when upgrading to the next version, what is one to expect of Perl in other respects?


I can tell you one thing: if IBM had written Perl, this would have never happened. Maybe there aren't enough alpha and beta testers, maybe developers don't have the time to write enough warning messages. What's certain is that Perl is not seen as a product, and the members of the community it attempts to serve are not being looked upon as customers. And that's the very difference between Open source and closed source software. What good is it's free, if it is deceiving its users about the problems it claims to solve?

In reply to Why does a Perl 5.6 regex run a lot slower on Perl 5.8? by perldeveloper

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.