Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
As indicated in the corrections to the initial reply, if you are accepting/expecting strings of utf8 bytes in your CGI query/param string -- and this is all supposed to be character data (as opposed to miscellaneous binary or hacker poison) -- then you should not be expecting any nulls, and can safely filter those out before doing anything else, if you like. This will do no damage to utf8 character data.

(It's only when you use UTF-16 (BE or LE) that you get null bytes in a unicode stream, and in such cases, the null bytes represent the high bytes of what would otherwise be the plain ASCII+Latin1 set: U0000-U00FF.)

If you are using Perl 5.8.x and are converting the parameter string to perl-internal utf8 strings (scalars having their "utf8-flag" set, e.g. by using the Encode::decode() method), then your suggested regex will work fine, because "\w" represents all "letters and numbers" (not just the ASCII set of 52, but also the Cyrillic, Greek, Arabic, etc).

Actually, contrary to past wisdom, it might be easier to specify the set of characters you want to exclude: particularly, ones in the ASCII range that have magical meandings for things like perl regexes, the shell, SQL, etc. This is actually a small and easily specified set, compared to all the miscellaneous multilanguage punctuation that folks might send you, all of which will consist of multi-byte tuples with high bits set, so they can't trigger anything worse than "invalid input" when misused in vulnerable contexts.

(I've been noticing a lot more people using "wide-character" versions of various quotes and brackets -- this tends to have the side effect of avoiding a variety of vulnerabilities involving the use the ASCII versions of these characters in certain contexts.)

For that matter, if you're accepting non-ASCII ut8 data via CGI (ASCII is a valid subset of utf8), then you must already be taking care to make sure that this stuff is not misused in your script: if it's going into a database via SQL, then you must be using "?" placeholders in your prepared SQL statements; if it's going into a local file, you must not be using the data to name the file; you surely are not including it in any way in any sort of "system()", backtick or other shell activity, and so on.

It makes sense to do basic sanity checks on the data (e.g. no null bytes or non-printing ASCII control characters), but beyond that, you shouldn't need to strip it down much to make it "safe", because you shouldn't be doing anything "dangerous" with it in the first place.


In reply to Re: Extra CGI.pm safety by stripping \x00 bytes? by graff
in thread Extra CGI.pm safety by stripping \x00 bytes? by rlucas

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-04-19 09:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found