comment on

When reading in CGI form fields from a multilingual, utf8 web application, it is not feasible to use the standard idiom for stripping evil characters:

$string =~ s/[^\w\s\.\,]//g; #plus any other metachars you want
# OR
$rawstring = m/([\w\s\.\,]+)/; #plus any others...
$string = $1;
[download]

Since users will be giving me all kinds of high bytes in order to give double-byte utf8 stuff, I need to be more accepting, as I understand it.

However, there persists the CGI Security and the null byte problem issue. Since the null byte can be used to fool various resources, I am tempted to subclass CGI and have the param() method do a s/\x00//g on *everything*. Is this ill-advised -- meaning, might the null byte ever show up in valid utf-8 text?

Remember, CGI uploads of binary files are handled through a different mechanism, so those would not be affected by overriding param(). Does a wise man always strip null bytes from param() returns, and if so, why isn't that the default behavior?

In reply to Extra CGI.pm safety by stripping \x00 bytes? by rlucas

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


more useful options
	PerlMonks