jettero has asked for the wisdom of the Perl Monks concerning the following question:

I am looking for a safe way to execute untrusted user regular expressions. If I have to, I'll XS up some gnu or pcre regular support into my application...

But I'm hoping there's a better way.

I briefly considered Safe.pm and running regulars on the regulars to remove things like (?{ system "rm -rvf /" }), but I decided that really wasn't reliable enough since perl gives you so many choices to execute code in your regexps.

Is there a way to do this without XS?

UPDATE: This conversation went a totally different direction than I was really thinking it would. I'm totaly fascinated by the C-stack limits of the pre 5.8.3 regexps. wow, you learn something every day.

UPDATE #2: Well after the fact, I did end up using XS to bring glibc/gnu regex into perl: POSIX::Regex.

Replies are listed 'Best First'.
Re: safe untrusted regexp
by diotalevi (Canon) on Aug 16, 2006 at 14:06 UTC

    Perl won't let you compile regexps that contain (?{...}) or (??{...}) blocks during runtime unless you also declare use re 'eval'. That won't stop someone from giving you a regexp that's designed to run out of C stack. You could upgrade to the 5.9.3+ regexp engine which isn't recursive and is now fully reentrant to solve that second problem. There are patches against earlier versions of perl but I don't have them handy to link to. Perhaps someone else will.

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      I just hand checked this "won't let you compile regexps" business. I'm completely surprised by that, thanks.

      Besides regexps that never finish, is there anything I actually do need to worry about qr-ing untrusted user expressions?

        I didn't say it directly but now I will. A regexp on perl's recursive regexp engine can cause it to run out of C stack which then triggers a segfault. That aborts your program. There are patches to perl for versions lie 5.8.4+ (or similar) to either mitigate this or completely work around it. This problem is completely gone in 5.9.4. You could upgrade to that immediately if you wished. It was just released yesterday.

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: safe untrusted regexp
by ikegami (Patriarch) on Aug 16, 2006 at 14:18 UTC
    In addition regexps using up stack space diotalevi mentioned, there's also the problem of regexps that never return because they take too long to execute. It was last discussed in Timeout alarm for regex, with slow regular expressions as followup.