comment on

It is not safe to use a regular expression passed in as a string from untrustworthy user input. A regexp can be crafted that will consume all of memory, all of remaining time in the universe, or all of one processor core. This vulnerability is inherent in NFA and hybrid-NFA regexp engines, which tend to be the more powerful regexp implementations as compared to DFA. It is also possible to craft a regexp that will cause a segfault under some Perl versions.

One cannot use alarm to interrupt the regexp engine either; when the engine is in control, alarm is ignored. Placing the evaluation and use of the untrusted regexp in a Safe compartment can provide some constraints, but still can't prevent memory and processing time abuse. Sys::SigAction is capable of interrupting a long-running regexp, but (despite what the documentation implies), even on Perl versions after 5.8, it's still possible for that interruption mid-regexp to cause a core dump.

It is not safe to evaluate arbitrary code. Therefore, the /e modifier must be used with a degree of caution as well.

Furthermore, a carefully crafted regexp could provide for introspection of values stored in %ENV, or other package globals including punctuation variables. This could leak information about the system or process that might be exploitable in some other way.

It is possible, if one knows that a particularly inefficient regexp is in use, to craft a string that will exploit pathological behavior in that regular expression. For example, if I know that the regexp is m/(a*)*[^b]$/ and I as a user manage to pass in 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab' I might be able to get the process to hang for a looong time. Remember, we cannot simply set an alarm or safely use Sys::SigAction to abort long running or memory hungry regexes. It is incumbent on the programmer to sanitize inputs, and to know what the exploitable weaknesses are. And regardless of the programming language in use, any general-purpose programming language of sufficient capability can be used in a way that fails to minimize exposure to abuse.

The string shown above will require that the regexp engine go through 5783 steps before failing. If we change the string to contain 74 "a" characters instead of 64, the number of steps grows so high that rxrx (the Regexp::Debugger) consumes all 16G of my physical RAM, and then drives me so far into swap that the system becomes unresponsive, requiring a reboot.

Dave

In reply to Re: user supplied regex substitution by davido
in thread user supplied regex substitution by trippledubs

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.