casiano has asked for the wisdom of the Perl Monks concerning the following question:

I wrote this program to compare the efficiency of Perl 5.10 possesive quantifiers (++, *+, etc.) against the ordinary operators:

pl@nereida:~/Lperltesting$ cat comparequotedstrings.pl #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1 use v5.10; use Benchmark qw{cmpthese}; # See http://www.regex-engineer.org/slides/img10.html my $regexp = qr/ " # double quote (?: # no memory [^"\\]++ # no " or escape: Don't backtrack | (?: \\.)++ # escaped character )*+ # Don't backtrack " # end double quote /x; my $bregexp = qr/ " # double quote (?: # no memory [^"\\]+ # no " or escape: backtrack | (?: \\.)+ # escaped character )* # backtrack " # end double quote /x; # input matches many times. Using "g" option my $input = (q{"abc\"defg"hijk}x10000); cmpthese( 0, { gbacktrack => q{ $input =~ /$bregexp/g }, gpossessive => q{ $input =~ /$regexp/g } } ); #input matches a long string, no g option $input = '"'.(q{abc\"defg}x10000).'"'; cmpthese( 0, { backtrack => q{ $input =~ /$bregexp/ }, possessive => q{ $input =~ /$regexp/ } } ); # input does not match. Using "g" option $input = '"'.("abcdefghijk"x10000); cmpthese( 0, { failgbacktrack => q{ $input =~ /$bregexp/g }, failgpossessive => q{ $input =~ /$regexp/g } } );
But the results don't produce the expected difference.
pl@nereida:~/Lperltesting$ ./comparequotedstrings.pl Rate gpossessive gbacktrack gpossessive 645439/s -- -1% gbacktrack 649435/s 1% -- Rate possessive backtrack possessive 694364/s -- -0% backtrack 696767/s 0% -- Rate failgpossessive failgbacktrack failgpossessive 649435/s -- -2% failgbacktrack 659647/s 2% -- pl@nereida:~/Lperltesting$
It seems as if there is no significant difference, even for the failing case. Am I doing something wrong?

Replies are listed 'Best First'.
Re: Possessive Quantifiers in Perl 5.10 regexps
by ikegami (Patriarch) on Sep 04, 2009 at 18:08 UTC

    Change

    gbacktrack => q{ $input =~ /$bregexp/g }, gpossessive => q{ $input =~ /$regexp/g }

    to

    gbacktrack => q{ use strict; use warnings; $input =~ /$bregexp/g }, gpossessive => q{ use strict; use warnings; $input =~ /$regexp/g }

    and you'll see what you're doing wrong. Code compiled inside of Benchmark can't see your lexicals, so you're benchmarking

    '' =~ //g
    against
    '' =~ //g

    It's no surprise you get the same results. You'll need to use package variables. (Switch my to our.)

Re: Possessive Quantifiers in Perl 5.10 regexps
by moritz (Cardinal) on Sep 04, 2009 at 18:10 UTC

    Your regex doesn't backtrack if there is a closing quote, so in this case the possessive quantifier doesn't help here.

    I guess that the regex engine optimizes the search by looking for a literal ", and fails immediately if it can't find one. That works the same way in both cases.

    Update: to disable this optimization you can use ["'] instead of " (a single char character class doesn't disable it).

    Perl 6 - links to (nearly) everything that is Perl 6.
Re: Possessive Quantifiers in Perl 5.10 regexps
by casiano (Pilgrim) on Sep 05, 2009 at 07:15 UTC
    Thanks ikegami, Thanks moritz.

    I have modified the code according to your comments. It is now as follows:

    pl@nereida:~/Lperltesting$ cat ./comparequotedstrings.pl #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1 use v5.10; use Benchmark qw{cmpthese}; # See http://www.regex-engineer.org/slides/img10.html our $regexp = qr/ ["'] # double quote (?: # no memory [^"\\]++ # no " or escape: Don't backtrack | (?: \\.)++ # escaped character )*+ # Don't backtrack ["'] # end double quote /x; our $bregexp = qr/ ["'] # double quote (?: # no memory [^"\\]+ # no " or escape: backtrack | (?: \\.)+ # escaped character )* # backtrack ["'] # end double quote /x; # input matches many times. Using "g" option our $input = (q{"abc\"defg"hijk}x10000); cmpthese( 0, { gbacktrack => sub { $input =~ /$bregexp/g }, gpossessive => sub { $input =~ /$regexp/g } } ); #input matches a long string, no g option $input = '"'.(q{abc\"defg}x10000).'"'; cmpthese( 0, { backtrack => sub { $input =~ /$bregexp/ }, possessive => sub { $input =~ /$regexp/ } } ); # input does not match. Using "g" option $input = '"'.("abcdefghijk"x10000); cmpthese( 0, { failgbacktrack => sub { $input =~ /$bregexp/g }, failgpossessive => sub { $input =~ /$regexp/g } } ); # Input does not match. Force the nested parenthesis # to work. our $quotes = q{\\"}x30; $input = '"'.("abcdef $quotes ghijk"x1000); cmpthese( 0, { failgbacktracknested => sub { $input =~ /$bregexp/g }, failgpossessivenested => sub { $input =~ /$regexp/g } } );
    Hope there are no more bugs.
    The possesive qualifier now wins in the first case where the input fails
    $input = '"'.("abcdefghijk"x10000);
    Is the only case where the version with the possesive quantifiers wins.

    Observe the huge difference in the final case (that also fails to match):

    pl@nereida:~/Lperltesting$ ./comparequotedstrings.pl Rate gpossessive gbacktrack gpossessive 569399/s -- -14% gbacktrack 665929/s 17% -- Rate backtrack possessive backtrack 163/s -- -4% possessive 169/s 4% -- Rate failgbacktrack failgpossessive failgbacktrack 16.6/s -- -97% failgpossessive 583/s 3419% -- pl@nereida:~/Lperltesting$ ./comparequotedstrings.pl Rate gpossessive gbacktrack gpossessive 588574/s -- -15% gbacktrack 694595/s 18% -- Rate backtrack possessive backtrack 164/s -- -4% possessive 171/s 4% -- Rate failgbacktrack failgpossessive failgbacktrack 17.3/s -- -97% failgpossessive 583/s 3276% -- (warning: too few iterations for a reliable count) s/iter failgpossessivenested failgbacktrackne +sted failgpossessivenested 23.1 -- - +100% failgbacktracknested 4.42e-02 52042%