Hello all. I've encountered serious regex performance degradation after upgrading to Perl 5.8. I've done much searching over the web and recompiled with various options but not found a lot of answers. Here is sample output from both versions of Perl and time examples that illustrate the degradation:
[dmandel@midgard dmandel]# time perl -e '$x=join("",(a..z))x100; for (1..100){$x =~ s/(.*?)I/$1/isge;}' real 0m3.420s user 0m2.590s sys 0m0.000s [dmandel@midgard dmandel]# time perl5.6.1 -e '$x=join("",(a..z))x100; + for (1..100){$x =~ s/(.*?)I/$1/isge;}' real 0m0.151s user 0m0.050s sys 0m0.000s [dmandel@midgard dmandel]# time perl -e '$x=join("",(a..z))x100; for (1..100){$x =~ s/(.*?)I/$1/isge;}' real 0m2.584s user 0m2.580s sys 0m0.010s [dmandel@midgard dmandel]# time perl5.6.1 -e '$x=join("",(a..z))x100; + for (1..100){$x =~ s/(.*?)I/$1/isge;}' real 0m0.108s user 0m0.050s sys 0m0.000s [dmandel@midgard dmandel]# time perl -e '$x=join("",(a..z))x100; for (1..100){$x =~ s/(.*?)I/$1/isge;}' real 0m2.718s user 0m2.570s sys 0m0.010s [dmandel@midgard dmandel]# time perl5.6.1 -e '$x=join("",(a..z))x100; + for (1..100){$x =~ s/(.*?)I/$1/isge;}' real 0m0.047s user 0m0.050s sys 0m0.000s
Clearly that particular regex isn't doing anything useful, but it ended up being the portion of another useful regex that was slowing things down. Here is the output of running the two Perls with the '-V' option if that might help explain things:
5.8.2:
[dmandel@midgard dmandel]# perl -V Summary of my perl5 (revision 5.0 version 8 subversion 2) configuratio +n: Platform: osname=linux, osvers=2.4.20-28.9, archname=i686-linux-thread-multi uname='linux midgard 2.4.20-28.9 #1 thu dec 18 13:45:22 est 2003 i +686 i686 i386 gnulinux ' config_args='' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemulti +plicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS +-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE +_OFFSET_BITS=64 -I/usr/include/gdbm', optimize='-O3', cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -fno-stri +ct-aliasing -I/usr/local/include -I/usr/include/gdbm' ccversion='', gccversion='3.2.2 20030222 (Red Hat Linux 3.2.2-5)', + gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=1 +2 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', + lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl. +a gnulibc_version='2.3.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynami +c' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY USE_ITHREADS USE_LARGE_FILES PERL +_IMPLICIT_CONTEXT Built under linux Compiled at Jan 8 2004 21:52:16 @INC: /usr/local/lib/perl5/5.8.2/i686-linux-thread-multi /usr/local/lib/perl5/5.8.2 /usr/local/lib/perl5/site_perl/5.8.2/i686-linux-thread-multi /usr/local/lib/perl5/site_perl/5.8.2 /usr/local/lib/perl5/site_perl
and 5.6.1:
[dmandel@midgard dmandel]# perl5.6.1 -V Summary of my perl5 (revision 5.0 version 6 subversion 1) configuratio +n: Platform: osname=linux, osvers=2.4.21-1.1931.2.393.entsmp, archname=i386-lin +ux uname='linux bugs.devel.redhat.com 2.4.21-1.1931.2.393.entsmp #1 s +mp thu aug 14 14:47:21 edt 2003 i686 unknown ' config_args='-des -Doptimize=-O2 -march=i386 -mcpu=i686 -Dcc=gcc - +Dcf_by=Red Hat, Inc. -Dcccdlflags=-fPIC -Dinstallprefix=/usr -Dprefix +=/usr -Darchname=i386-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Uu +sethreads -Uuseithreads -Uuselargefiles -Dd_dosuid -Dd_semctl_semun - +Di_db -Di_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Dinc_ver +sion_list=5.6.0/i386-linux 5.6.0' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultipl +icity=undef useperlio=undef d_sfio=undef uselargefiles=undef usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef Compiler: cc='gcc', ccflags ='-fno-strict-aliasing -I/usr/local/include', optimize='-O2 -march=i386 -mcpu=i686', cppflags='-fno-strict-aliasing -I/usr/local/include' ccversion='', gccversion='2.96 20000731 (Red Hat Linux 7.3 2.96-11 +3)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=1 +2 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', + lseeksize=4 alignbytes=4, usemymalloc=n, prototype=define Linker and Libraries: ld='gcc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -ldl -lm -lc -lcrypt -lutil perllibs=-lnsl -ldl -lm -lc -lcrypt -lutil libc=/lib/libc-2.2.5.so, so=so, useshrplib=false, libperl=libperl. +a Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynami +c' cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib' Characteristics of this binary (from libperl): Compile-time options: Built under linux Compiled at Aug 18 2003 16:08:31 @INC: /usr/lib/perl5/5.6.1/i386-linux /usr/lib/perl5/5.6.1 /usr/lib/perl5/site_perl/5.6.1/i386-linux /usr/lib/perl5/site_perl/5.6.1 /usr/lib/perl5/site_perl/5.6.0/i386-linux /usr/lib/perl5/site_perl/5.6.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.6.1/i386-linux /usr/lib/perl5/vendor_perl/5.6.1 /usr/lib/perl5/vendor_perl
Thank you for your time. Sincerely, Danny Mandel

Edited by Chady -- added readmore tag.


In reply to serious regex performance degradation after upgrade to perl 5.8 from 5.6 by dmandel

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.