in reply to untainting unicode text using \w

Wouldn't it be better to encode the mail body using base64 or quoted printable? That way you don't need to worry about what's in the body at all.

In general, if you can't strictly validate input (i.e. match it against known-good data), it's better to make the process completely indifferent to the input. In the same way that using placeholders with DBI is better than grepping on (un)safe characters.

Replies are listed 'Best First'.
Re^2: untainting unicode text using \w
by danmcb (Monk) on Aug 25, 2007 at 00:52 UTC

    ah! yes, it probably would. I have to admit that although I'd heard of these encodings, I have never checked out what they actually are, nor had it occurred to me that this might be a useful side effect.

    Thanks, Joost!