trippledubs has asked for the wisdom of the Perl Monks concerning the following question:

Is anonymous monks code hackable? How? Seems to me that Perl's regex engine is so tricky that it could never be trusted with untrusted input, but by excluding the evaluation regex modifier or other methods is it possible to give a malicious user most of  s/ functionality safely? Re^4: Passing a regex from a CGI HTML form (user supplied regex substitution without eval)

Replies are listed 'Best First'.
Re: user supplied regex substitution
by davido (Cardinal) on Sep 01, 2016 at 15:14 UTC

    It is not safe to use a regular expression passed in as a string from untrustworthy user input. A regexp can be crafted that will consume all of memory, all of remaining time in the universe, or all of one processor core. This vulnerability is inherent in NFA and hybrid-NFA regexp engines, which tend to be the more powerful regexp implementations as compared to DFA. It is also possible to craft a regexp that will cause a segfault under some Perl versions.

    One cannot use alarm to interrupt the regexp engine either; when the engine is in control, alarm is ignored. Placing the evaluation and use of the untrusted regexp in a Safe compartment can provide some constraints, but still can't prevent memory and processing time abuse. Sys::SigAction is capable of interrupting a long-running regexp, but (despite what the documentation implies), even on Perl versions after 5.8, it's still possible for that interruption mid-regexp to cause a core dump.

    It is not safe to evaluate arbitrary code. Therefore, the /e modifier must be used with a degree of caution as well.

    Furthermore, a carefully crafted regexp could provide for introspection of values stored in %ENV, or other package globals including punctuation variables. This could leak information about the system or process that might be exploitable in some other way.

    It is possible, if one knows that a particularly inefficient regexp is in use, to craft a string that will exploit pathological behavior in that regular expression. For example, if I know that the regexp is m/(a*)*[^b]$/ and I as a user manage to pass in 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab' I might be able to get the process to hang for a looong time. Remember, we cannot simply set an alarm or safely use Sys::SigAction to abort long running or memory hungry regexes. It is incumbent on the programmer to sanitize inputs, and to know what the exploitable weaknesses are. And regardless of the programming language in use, any general-purpose programming language of sufficient capability can be used in a way that fails to minimize exposure to abuse.

    The string shown above will require that the regexp engine go through 5783 steps before failing. If we change the string to contain 74 "a" characters instead of 64, the number of steps grows so high that rxrx (the Regexp::Debugger) consumes all 16G of my physical RAM, and then drives me so far into swap that the system becomes unresponsive, requiring a reboot.


    Dave

        Yes, that's where I began to learn how hard it is to safely accept user-supplied regexes, and started as an experiment in that regard. :)

        The tester uses Safe and some heuristics to reduce the vulnerability to introspection of globals and special variables. It uses Sys::SigAction to time-out long running regexes (and consequently, has a tendency to segfault from time to time), looks for a few common "bad player" type regexes... and still isn't safe. But it runs in a little heroku world, and when it does get cranky its scope is limited.


        Dave

Re: user supplied regex substitution
by ww (Archbishop) on Sep 01, 2016 at 14:23 UTC

    Q1: nearly unintelligible; what AM's code? hacked how?

    Q2: depends on intent of Q1. Please clarify.

    Q3: Confusing at best. Are you really asking about methods to hack-proof a regex (regexen, generally?) or how to "give a malicious user" an opportunity to do something baaaad?


    If I've misconstrued your question or the logic needed to answer it, I offer my apologies to all those electrons which were inconvenienced by the creation of this post.
Re: user supplied regex substitution
by Linicks (Scribe) on Sep 01, 2016 at 18:18 UTC

    I found a fairly good solution to this - see the thread I started and my reply marked as SOLVED

    Nick