in reply to Extra CGI.pm safety by stripping \x00 bytes?

Certainly the null byte can appear in utf-8; code points like \x2400, \x2500, . . . all have them.

The poison null cracks from perl all occur where C code looks at perl strings and takes them as null-delimited C strings. That typically happens when the string is fed to system and interpreted by the shell. Your caution is justified there, but not as a blanket ban on null bytes.

Update: Oops! Thanks, guys++, I didn't know that.</blush>

After Compline,
Zaxo

  • Comment on Re: Extra CGI.pm safety by stripping \x00 bytes?

Replies are listed 'Best First'.
Re^2: Extra CGI.pm safety by stripping \x00 bytes?
by dave_the_m (Monsignor) on May 26, 2005 at 20:58 UTC
    Certainly the null byte can appear in utf-8; code points like \x2400, \x2500, . . . all have them
    No, utf8 is specifically formulated to avoid null bytes in any codepoints apart from zero, eg
    $ perl586 -MDevel::Peek -e'Dump "\x{2400}"' SV = PV(0x8181f00) at 0x816e234 REFCNT = 1 FLAGS = (POK,READONLY,pPOK,UTF8) PV = 0x817c268 "\342\220\200"\0 [UTF8 "\x{2400}"] CUR = 3 LEN = 4
    That codepoint is represented by three bytes, none of which is zero.

    Dave.

Re^2: Extra CGI.pm safety by stripping \x00 bytes?
by marnanel (Beadle) on May 26, 2005 at 21:16 UTC
    There's no null byte in the UTF-8 encoding of \x2400 (it's E2, 90, 80). Null bytes shouldn't appear in UTF-8 streams unless actually representing a null character: it's part of the design of UTF-8 that any byte <128 represents itself.
Re^2: Extra CGI.pm safety by stripping \x00 bytes?
by rlucas (Scribe) on May 26, 2005 at 19:56 UTC
    OK - thanks for clarifying that for me. I understood the nature of the crack as described by Ovid in his node (and by others elsewhere on the web). In fact, I'm not anticipating sending anything to system(), and I'm tainting things.

    However, when I send utf8 text to other external C programs (databases, for example, or sendmail), should I take special caution in those cases?