in reply to Re: Re: Re: How to identify invalid reg. expr.?
in thread How to identify invalid reg. expr.?

Interesting. I didn't know you couldn't eval code in a regex at run-time. Well, even without that I could still hog your CPU by passing it a regex with an exponential solving time.

As far as what you can do - don't accept a regex from an untrusted user. I don't believe there's any way to fully validate the friendliness of a regex. Maybe you could offer your users a set of pre-canned searchs "full-word search", "phrase search", "starts with", "ends with", etc. Then use the input to build the appropriate regex with \Q$term\E to quarantine the input.

-sam

  • Comment on Re: Re: Re: Re: How to identify invalid reg. expr.?

Replies are listed 'Best First'.
Re: How to identify invalid reg. expr.?
by Smylers (Pilgrim) on Jun 06, 2002 at 12:13 UTC
    I could still hog your CPU by passing it a regex with an exponential solving time.

    Could alarm be used to mitigate against this?

    Smylers

      Hmm, I'd never heard of alarm before now, so I just threw a little something together to test:
      #!/usr/bin/perl -w BEGIN { #stuff } alarm 10; #the bulk of the program snipped. $_ = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'; print "Match\n<p>" if /a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*[b]/;
      That example is given in the camel book as one that won't finish until after the heat death of the universe. Without the alarm, it does indeed hog the CPU, but with the alarm in there, the script just dies after 10 seconds, which should be plenty long enough for any "valid" request to run. That can be changed, of course. I guess I'd also need to put in a signal handler to do any cleanup when the alarm goes off. Hmm, and I suppose I should also try putting the alarm in the "search_by_regex routine", (since that's the only area I'm concerned about), but possibly look for a way to not "die" if the delay is from a slow database or something other than a wonky regex.

      I also implemented merlyn's eval/regex solution from this node in this thread (or just scroll down, maybe).

      So, is this a viable solution, or are there still problems with it? Thanks.

      --

      There are 10 kinds of people -- those that understand binary, and those that don't.