in reply to Assessing the complexity of regular expressions

Calculating it may be too slow, but one reasonable measure is the length of the compiled version of the regular expression. You can get that printed to STDERR in Perl 5.10 with the re pragma. (I think you want the Debug DUMP option.) Unfortunately capturing that is hard, but in principle you can run it in an external process, capture the output, then look at it.
  • Comment on Re: Assessing the complexity of regular expressions

Replies are listed 'Best First'.
Re^2: Assessing the complexity of regular expressions
by kyle (Abbot) on Jan 27, 2009 at 19:01 UTC

    I like that idea, but it's not really safe, what with code blocks inside of regular expressions these days.

    use re 'debug'; my $x = qr/a(??{ BEGIN { die } })/; __END__ Compiling REx "a(??{ BEGIN{ die } })" panic: top_env

    Tests with code other than die also shows the BEGIN block being executed even though the regular expression is never used.

      It's always good to be on the lookout for security issues. However, just whose code do you plan on using Perl::Critic to critique? Isn't the idea to get people who are already able to execute arbitrary code on your team's systems to think about what they are doing? I'm not sure I'd want to run any random code from out in the wild through any part of Perl::Critic or any other development tool without checking for nasty things like that first. Perhaps in the right context, this safety risk is totally acceptable. In what situation are you using Perl::Critic that this would be a serious problem?

        There's a web service at http://perlcritic.com/ which will run Perl::Critic over code sent in from the web. I'm sure they'd like to keep that as safe as possible.

        There's a Perl::Critic::Dynamic distribution just for collecting policies that have to compile the code to run, but I suspect it doesn't get much use. The only policy in it is Perl::Critic::Policy::Dynamic::ValidateAgainstSymbolTable. If I wanted to score regular expressions using re as tilly suggests, it would have to go in with the (one) other dynamic policy, which doesn't seem desirable considering users who already expect that functionality in the core distribution.

      Good point. However I'll note that the risk is somewhat less when it is run in an external process. Though admittedly, it is not totally gone.