Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Allowing regex entries in web form to search database: Risks or gotchas?

by dave_the_m (Monsignor)
on Aug 08, 2022 at 21:39 UTC ( #11146043=note: print w/replies, xml ) Need Help??


in reply to Allowing regex entries in web form to search database: Risks or gotchas?

Perl's regex engine has evolved over 30+ years; it's huge and crusty, with large chunks nobody quite understands any more. There are many ways of writing regexes that will consume effectively infinite CPU unless you kill it off. Until recent perl releases, there were many bugs in the regex compiler that would overflow integers and do strange things, e.g. in patterns like /((((foo){2000}){2000}){2000})/. And that's just the bugs we know about.

So I wouldn't want to allow the general public the ability to supply arbitrary patterns to a web server.

Not all is lost however. Perl allows other regex engines to be plugged in. In particular the module re::engine::RE2 allows perl to use Google's RE2 regex engine. This doesn't support as many features as the perl engine, but in this case that's a plus.

Dave.

  • Comment on Re: Allowing regex entries in web form to search database: Risks or gotchas?

Replies are listed 'Best First'.
Re^2: Allowing regex entries in web form to search database: Risks or gotchas?
by Polyglot (Hermit) on Aug 09, 2022 at 00:44 UTC

    Dave,

    How much effect would limiting nested parentheses to two and {##} numbers to two digits have on that CPU resource hogging? Would there be an effective way of mitigating against this?

    This is the sort of helpful tip I'm looking for. It does little good to say ever so meaningfully: "You would be ill-advised to do this...." I'm looking for rational support to such a statement; as in, why is it inadvisable.

    Once potential pitfalls are identified, only then can one hope to address them. And I do hope to make things safer, albeit, not completely foolproof.

    I'm reminded of a setting provided to server administrators in shorewall's firewall management tools....something like "ADMIN_IS_ABSENT_MINDED = 1". Hah! It was supposed to keep the current connection open in case of a firewall restart with ill-advised settings that might have inadvertently locked even the admin out! It's simply never possible to make something completely foolproof, and I don't intend to try. But I do want to make it, at the very least, secure from hacker penetration. CPU resources is one thing. Gaining server admin privileges through a security hole is another.

    Blessings,

    ~Polyglot~

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11146043]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2022-09-30 08:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer my indexes to start at:




    Results (125 votes). Check out past polls.

    Notices?