Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Extra CGI.pm safety by stripping \x00 bytes?

by graff (Chancellor)
on May 26, 2005 at 23:15 UTC ( [id://460898]=note: print w/replies, xml ) Need Help??


in reply to Extra CGI.pm safety by stripping \x00 bytes?

As indicated in the corrections to the initial reply, if you are accepting/expecting strings of utf8 bytes in your CGI query/param string -- and this is all supposed to be character data (as opposed to miscellaneous binary or hacker poison) -- then you should not be expecting any nulls, and can safely filter those out before doing anything else, if you like. This will do no damage to utf8 character data.

(It's only when you use UTF-16 (BE or LE) that you get null bytes in a unicode stream, and in such cases, the null bytes represent the high bytes of what would otherwise be the plain ASCII+Latin1 set: U0000-U00FF.)

If you are using Perl 5.8.x and are converting the parameter string to perl-internal utf8 strings (scalars having their "utf8-flag" set, e.g. by using the Encode::decode() method), then your suggested regex will work fine, because "\w" represents all "letters and numbers" (not just the ASCII set of 52, but also the Cyrillic, Greek, Arabic, etc).

Actually, contrary to past wisdom, it might be easier to specify the set of characters you want to exclude: particularly, ones in the ASCII range that have magical meandings for things like perl regexes, the shell, SQL, etc. This is actually a small and easily specified set, compared to all the miscellaneous multilanguage punctuation that folks might send you, all of which will consist of multi-byte tuples with high bits set, so they can't trigger anything worse than "invalid input" when misused in vulnerable contexts.

(I've been noticing a lot more people using "wide-character" versions of various quotes and brackets -- this tends to have the side effect of avoiding a variety of vulnerabilities involving the use the ASCII versions of these characters in certain contexts.)

For that matter, if you're accepting non-ASCII ut8 data via CGI (ASCII is a valid subset of utf8), then you must already be taking care to make sure that this stuff is not misused in your script: if it's going into a database via SQL, then you must be using "?" placeholders in your prepared SQL statements; if it's going into a local file, you must not be using the data to name the file; you surely are not including it in any way in any sort of "system()", backtick or other shell activity, and so on.

It makes sense to do basic sanity checks on the data (e.g. no null bytes or non-printing ASCII control characters), but beyond that, you shouldn't need to strip it down much to make it "safe", because you shouldn't be doing anything "dangerous" with it in the first place.

  • Comment on Re: Extra CGI.pm safety by stripping \x00 bytes?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://460898]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2024-04-19 16:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found