jk2addict has asked for the wisdom of the Perl Monks concerning the following question:

This is probably a simple question with a complicated answer. I'm working on a website conversion from ASP->BAMP and am trying to dot as many i's and cross as many t's as possible.

That means, full completed Pod in every module. Test of the core in Test::More. Testing of the tablibs and pages in Apache::Test. And towards the top of the list, making sure that all user input is expected, safe as possible and untainted.

Sounds easy right? Well most of that list has been easy, but that last one has me stumped. I have no idea where to start when it comes to scrubbing user input on the web.

Sure, some of those things are easy to check. If it's a quantity field, only allow digits between 1 and the max allowed. But, what if they're inputting the description of something into a text area? What about the name of a product or the name of a vendor or company? Filtering for only a-zA-Z0-9 isn't practical. What about UTF and foreign characters?

Sure, I can disallow:

` . ; \ / @ & | % ~ < > " $ ( ) { } [ ] * ! '

But, is that the correct answer? No periods to end a sentence? No $ sign. Now exclamation? That's not very realistic either.

HTML::Sanitize seem to only be for HTML. Then there's Safe, but that merely shift the problem to a safe compartment.

So after all my rambling, what are fellow monks doing in the real world?

Replies are listed 'Best First'.
Re: Preferred Way of Scrubbing User Input Before DB Write
by Zaxo (Archbishop) on Feb 02, 2004 at 20:02 UTC

    Assuming DBI, you can relax your restrictions a lot by using placeholders or the quote method.

    After Compline,
    Zaxo

      Good point. My concern is that this is not exactly safe if the Data Access Layer is unknown or not part of the package.

      In my case I'm built on top of Class::DBI so as a 3rd party programmer, I have no idea whether Class::DBI is playing safe using placeholders and quote. Sure, I can look for myself in the source, but what if I change DAL to a completely different layer or module?

      When in doubt, the wise decision would still be not to accept any user input unless I'm sure it's safe, long before I pass it into a DAL. For that matter, let's assume there is no DAL. Then sanitizing user input is still my job.

        Class::DBI does use placeholders. Changing to a Perl-based DAL that doesn't use placeholders would be really, really stupid. DBI always supports placeholders (even if the underlieing database doesn't), and the additional security and caching support they provide make it reckless to not use them. If there is no DAL, then it's up to you to use placeholders.

        I'm not saying to ignore input checking (I always do it even when I know placeholders will be there), but to put the problem in perspective.

        ----
        I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
        -- Schemer

        : () { :|:& };:

        Note: All code is untested, unless otherwise stated

Re: Preferred Way of Scrubbing User Input Before DB Write
by jdtoronto (Prior) on Feb 02, 2004 at 20:32 UTC
    jk2addict,

    Well, why not give us a short one to answer on Monday, hard ones are supposed to be for Friday aren't they?

    It really depends on what you expect. Using HTML::Sanitize is handy when you have a form on a web site. I don't use it a lot because my validator tends to handle most of the stuff.

    The heavy artillery in this battle is Data::FormValidator which comes with Data::FromValidator::Constraints and all of which can be teamed up with Regexp::Common. Validator has a series of filters already included, and you can add others, so you can write a validator/filter combaination for your descriptions that allows plain text with no HTML. But where it is somebodies sig file you might want to allow limited HTML styling.

    Apart from the contraints and rules already created you can add your own using several different methods and even create complex rules that involve multiple fields or complex relationships. The ability to use cpan;//Regexp::Common is pretty darned amazing, but it is something I have never had need of yet, well, not yet! Maybe in a day or two looking at my to do list :)

    jdtoronto