Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^2: Allowing regex entries in web form to search database: Risks or gotchas?

by Polyglot (Chaplain)
on Aug 09, 2022 at 00:20 UTC ( [id://11146049]=note: print w/replies, xml ) Need Help??


in reply to Re: Allowing regex entries in web form to search database: Risks or gotchas?
in thread Allowing regex entries in web form to search database: Risks or gotchas?

Jenda,

I'm not entirely sure what you mean by "underlying engine." My script does the evaluation--I'm not depending on any third-party tools. This has as much to do with the fact that I can rarely understand how to implement others' modules as anything. (Object-oriented code baffles me.)

The regex evaluation is fairly simple, and meant to allow virtually any arbitrary expression, with a few important exceptions such as not allowing the user to insert executable code into it. Giving the user freedom to enter his or her own regular expression is what makes the feature so attractive and powerful. There is no other way to properly find certain things without a good regex, and it would be impossible to pre-supply all potential regex forms that might be needed.

Users have several simple options at their disposal that do not require the evaluation of a regular expression. For example, they may select for case sensitivity, the matching of whole words (i.e. \bWord\b), or to enter their own word/text delimiters. But these options will be ignored if the user chooses to use his or her own regular expression--in which case the matching of whole words, etc., would be left entirely to the user's own regex.

As for "You should not look for dangerous stuff, you should check you only got safe stuff!", how would you propose to divide between these two? What defines "safe"? As with anything on this planet, even the safest of things can be made to be harmful when placed in the wrong hands. Because people could drown in water is no reason to withhold it and cause them to die of thirst!

Blessings,

~Polyglot~

  • Comment on Re^2: Allowing regex entries in web form to search database: Risks or gotchas?

Replies are listed 'Best First'.
Re^3: Allowing regex entries in web form to search database: Risks or gotchas?
by Jenda (Abbot) on Aug 10, 2022 at 20:32 UTC

    You wrote "database" so I assumed there's a database engine, say PostgreSQL, and that's where you store the data. If it were so you could either use the regexps provided by that database engine, use Perl within that engine or fetch all the data to be searched and evaluated the expressions within the script.

    It's you who defines safe and you need to decide what's safe for each individual use. The point is that instead of

    if ($input =~ /something I already know is dangerous/) { die 'I refuse + to handle this!'; }
    you should always write
    if ($input !~ /^only stuff I know is fine$/) { die 'I refuse to handle + this!'; }

    I can't give you a generic "this is unsafe" or a generic "this is safe" not knowing what happens to the $input afterwards. It's something you have to do. The thing is that it's much easier to forget to list something that's dangerous, than it is to accidentally allow something that's dangerous.

    Jenda
    1984 was supposed to be a warning,
    not a manual!

      I was trying to find a source supporting the "whitelisting is safer than blacklisting" approach, but alas most links revolved around the question if these terms are racist and should be replaced and by what.

      Too much a master-slave dilemma for a shady dark-pinky guy like me.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        Blacklist (blocklist, redlist) is always playing catchup.

      For clarification: I don't trust MySQL/MariaDB for regex operations. The only part it plays in this is to turn over the records, after which they are searched via Perl, and any that match following the search get formatted and returned to the client's browser. The database is entirely isolated from the regex operations.

      On one hand, I might agree with your premise that one should only use what is trusted. But that word "trusted" is precisely where things get sticky. What or whom do you trust?

      If you cannot define or distinguish between what is "safe and trusted" and what is "unsafe or dangerous," then you have no validity to saying "allow only what is safe."

      For illustration, personally, I don't trust Microsoft Windows anymore, having had too many virus and security issues with it in the past. One time I was having some issues with my router and couldn't seem to get it to NAT the internet through to my PC, so I temporarily bypassed the router and hooked up directly to the DSL modem (looking for answers online to solve the router issue). I kid you not, within five minutes someone was beginning to control my computer, i.e. the mouse was moving and things were changing on screen without my input. I instantly disconnected the patch cable and never tried that again with a Windows computer. (I've done similar things with linux and MacOSX with no problem.) I mean, five minutes!

      Because Windows itself can be problematic, should one not trust it for anything? Where does one draw the line? And this is the part that you seem unwilling to attempt to define--which is why there is a weakness in your reasoning.

      There is no real-world chance of any software being 100% perfectly safe. One must, of necessity, work with a reasonable level of risk (some might use the term "manageable risk"). My original question here asked for guidance as to what the specific risk factors might be. I have had very little response, other than the CPU-crashing possibilities of wildcard use in the regex. To me, this indicates that the use of regex itself is not a big security risk, or I would have many ready to jump in with their own reports of the known risks.

      Which brings it back to the essential question: Are there any big "gotchas" with allowing regex in a search field?

      Blessings,

      ~Polyglot~

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11146049]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2024-04-20 02:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found