in reply to serious regex performance degradation after upgrade to perl 5.8 from 5.6

I was looking the different times, and saw that Perl-5.8x is about 4 times more slow than Perl-5.6x.

I can be wrong, but on Perl-5.8 UTF-8 will make the strings to alocate 4 bytes for each character. And REGEXP when looking in the string will need to handle that too.

From POD, perlunicode:

UTF-8 is a variable-length (1 to 6 bytes, current character allocation +s require 4 bytes)...
And from bytes:
As an example, when Perl sees $x = chr(400), it encodes the character +in UTF-8 and stores it in $x. Then it is marked as character data, so +, for instance, length $x returns 1. However, in the scope of the byt +es pragma, $x is treated as a series of bytes - the bytes that make u +p the UTF8 encoding - and length $x returns 2:
Soo, this code:
$x = chr(400); print 'Length: ', length $x, qq~\n~; { use bytes; print 'Length (bytes): ', length $x, qq~\n~; }
Has the output:
Length: 1 Length (bytes): 2

Soo, to see if just a string 4 times bigger can make the REGEXP 4 times slow, make the same test, but adding a string bigger and compare with the tests of this node.

But note that the REGEXP machine in Perl-5.8x is much more complex than in Perl-5.6x just to need to handle the different encode formats that Perl handles. Maybe you need to look for some pragma that disable UTF-8 handling on REGEXP (that I haven't found), and not to try to recompile Perl.

Graciliano M. P.
"Creativity is the expression of the liberty".

Replies are listed 'Best First'.
Re: Re: serious regex performance degradation after upgrade to perl 5.8 from 5.6
by Anonymous Monk on Jan 20, 2004 at 23:44 UTC
    Soo, to see if just a string 4 times bigger can make the REGEXP 4 times slow

    What a load of crap - This has to be seriously one of the worst answers that I have seen posted on this site, dressed up as determinate rationale ...

    The OP would be better to following the avenues of investigation offered by other posters, namely, increasing test sample size, employing a better test framework (Benchmark), considering the difference between threaded and unthreaded versions of Perl - the performance difference between threaded and unthreaded versions of Perl can be quite significant, even where threads is not employed - and following up with perl5-porters.

      So gmpassos' post was incorrect -- your response however seems unnecessarily aggressive; particularly when posting anonymously care needs to be taken to avoid appearing abusive.

      The second sentence of gmpassos' message begins by saying that he might be wrong; he offered a suggestion, it might not have been right but there's no reason to believe his intention was anything other than to try to help.

      edit: fixed tpyo

    A reply falls below the community's threshold of quality. You may see it by logging in.