in reply to Things you should need to know before using Perl regexes. (Humour, with a serious point)

Further, every time you use capturing brackets, all the captured chunks are also copied--again.

Not exactly ;)

Q:\>perl -le "($x = 'foo') =~ /.(.)/g; print $1; $x = 'bar'; print $1" o a This is perl, v5.8.2 built for MSWin32-x86-multi-thread

Of course, the /g modifier there is a bug in the code, but it still shows that not in every case, a capturing match copies the buffer.

Thanks to dave_the_m and demerphq's recent work, the Perl5.10 regex engine is improving even more beyond the C recursion elimination. It has named captures that bring it up to par and beyond what the other named captures provide, and it has quite the speedup against Unicode strings as far as I understand the changes. There are some deeper problems with how closures-in-regular expressions are handled vs. interpolation (in (?{..}) blocks).

This post is mostly about adding some perspective to the changes that happen to the regex engine ;)

  • Comment on Re: Things you should need to know before using Perl regexes. (Humour, with a serious point)
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: Things you should need to know before using Perl regexes. (Humour, with a serious point)
by tinita (Parson) on Oct 25, 2006 at 12:37 UTC
    Q:\>perl -le "($x = 'foo') =~ /.(.)/g; print $1; $x = 'bar'; print $1" o a
    wow. didn't expect that.

    betterworld and i just tried out some other examples, and it seems that the string buffer is not really emptied.

    $ perl -MData::Dumper -we'$Data::Dumper::Useqq = 1; ($x = "fou") =~ /.(..)/g; print Dumper $1; $x = "b"; print Dumper $1; ' $VAR1 = "ou"; $VAR1 = "\0u"; $ perl -MData::Dumper -we'$Data::Dumper::Useqq = 1; ($x = "fou") =~ /.(..)/g; print Dumper $1; $x = "ba"; print Dumper $1; ' $VAR1 = "ou"; $VAR1 = "a\0";
    so $1 just outputs the second and third character of the string, and in the first example you see the remains of the string 'fou'