jar00n has asked for the wisdom of the Perl Monks concerning the following question:

Hey guys,

Got a script I'm working and performance and speed are the name of the game. I have optimized a large portion of the code but this chunk stands out as something to fixed or done in a more correct way.

# Don't expand metacharacters $dir =~ s/\[/\\\[/g; $dir =~ s/\]/\\\]/g; $dir =~ s/\{/\\\}/g; $dir =~ s/\~/\\\~/g; $dir =~ s/\*/\\\*/g; $dir =~ s/\?/\\\?/g;

I feel like there's got to be a better way to handle this and that the better way would be faster. That snippet will be called hundreds of thousands of times and I'm looking to squeeze out as much performance as I can.

Any tips or advice would be excellent.

--Shields

Replies are listed 'Best First'.
Re: Faster way to regex this
by hdb (Monsignor) on Jun 08, 2014 at 09:39 UTC

    Have you tried using a character class?

    $dir =~ s/([\[\]{}~*?])/\\$1/g;

      A look-ahead might be faster, but this still doesn't address the question of  \ (backslash) handling raised by Corion below.

      c:\@Work\Perl\monks>perl -wMstrict -le "my $s = 'foo\bar\[baz[x]x{x}x~x*x?x[]{}~*?'; print qq{'$s'}; ;; $s =~ s{ (?= [][{}~*?]) }{\\}xmsg; print qq{'$s'}; " 'foo\bar\[baz[x]x{x}x~x*x?x[]{}~*?' 'foo\bar\\[baz\[x\]x\{x\}x\~x\*x\?x\[\]\{\}\~\*\?'
Re: Faster way to regex this
by Corion (Patriarch) on Jun 08, 2014 at 09:52 UTC

    Are you sure that you don't need to treat the backslash like your other characters?

    Your current incarnation will leave foo\bar as-is and convert foo\[ to foo\\[. I'm not certain that that is what you want.

Re: Faster way to regex this
by davido (Cardinal) on Jun 08, 2014 at 15:42 UTC

    I don't have time to test, but generally shoving as much work as possible into "internals" is the way to go. The existing solution has to start up the regex engine six times. It would be better to start it once, and let it look for six things. (This isn't a universal win, but in your case, I think it will be.)

    $dir =~ s/([[\]{~*?])/\\$1/g;

    However, it's possible you would be just as happy from a functionality standpoint, and even happier from a performance standpoint with quotemeta, or \Q...\E:

    my $dir = "[]{~*?"; my $dir2 = "[]{~*?"; $dir =~ s/([[\]{~*?])/\\$1/g; print "$dir\n"; print quotemeta($dir2), "\n";

    As you can see, for these characters, they are functionally equivalent. However, quotemeta will also escape other "special" characters, so you should first familiarize yourself with how it behaves for your use case. quotemeta's advantage is that it doesn't invoke the regexp engine, which is a substantial apparatus.


    Dave

Re: Faster way to regex this
by Anonymous Monk on Jun 08, 2014 at 09:38 UTC
    Um, quotemeta? :D Do it in one pass like String::Escape
    # $special_characters_escaped = backslash( $source_string ); sub backslash ($) { local $_ = ( defined $_[0] ? $_[0] : '' ); # Preserve only printable ASCII characters other than \, ", $, and + @ s/([^\x20\x21\x24\x25-\x39\x41-\x5b\x5d-\x7e])/\\$Backslashed{$1}/ +gs; return $_; }
Re: Faster way to regex this
by Anonymous Monk on Jun 08, 2014 at 10:09 UTC

    Are you sure you only want these six characters escaped? What about properly escaping the backslash itself?

    # escape using quotemeta: say "\Q$dir\E"; # escape only those six characters plus "\\": $dir =~ s/(?=[][{~*?\\])/\\/g; say $dir;