I was looking the different times, and saw that Perl-5.8x is about 4 times more slow than Perl-5.6x.

I can be wrong, but on Perl-5.8 UTF-8 will make the strings to alocate 4 bytes for each character. And REGEXP when looking in the string will need to handle that too.

From POD, perlunicode:

UTF-8 is a variable-length (1 to 6 bytes, current character allocation +s require 4 bytes)...
And from bytes:
As an example, when Perl sees $x = chr(400), it encodes the character +in UTF-8 and stores it in $x. Then it is marked as character data, so +, for instance, length $x returns 1. However, in the scope of the byt +es pragma, $x is treated as a series of bytes - the bytes that make u +p the UTF8 encoding - and length $x returns 2:
Soo, this code:
$x = chr(400); print 'Length: ', length $x, qq~\n~; { use bytes; print 'Length (bytes): ', length $x, qq~\n~; }
Has the output:
Length: 1 Length (bytes): 2

Soo, to see if just a string 4 times bigger can make the REGEXP 4 times slow, make the same test, but adding a string bigger and compare with the tests of this node.

But note that the REGEXP machine in Perl-5.8x is much more complex than in Perl-5.6x just to need to handle the different encode formats that Perl handles. Maybe you need to look for some pragma that disable UTF-8 handling on REGEXP (that I haven't found), and not to try to recompile Perl.

Graciliano M. P.
"Creativity is the expression of the liberty".


In reply to Re: serious regex performance degradation after upgrade to perl 5.8 from 5.6 by gmpassos
in thread serious regex performance degradation after upgrade to perl 5.8 from 5.6 by dmandel

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.