I use taint mode in all of my CGI programs and am starting to wonder if I'm being too restrictive in some cases.
Generally when 'free text' input is required, I use a regex to ensure it matches \w and a small number of
punctuation characters, and substitute line-breaks with <br>'s.
The data I'm taking about here is stuff that will be getting stuffed into a database (using placeholders)
and getting displayed again as HTML (going through CGI's escapeHTML method),
it will not be used as a filename, sent to system calls, etc.
I'm now in the position of wanting to allow similarly 'free text' UNICODE input and I don't know realistically what to
allow.
I'm quite tempted to allow anything other than the null byte, which is the only thing I can think of that might
mess up either the database insertion or the HTML display.
However, I've always practiced making sure the data contains only what I do want to allow, not what I don't.
I've
super searched for "taint unicode" and haven't found anything that really helps.
I've read the core perl unicode docs and understand how to untaint using unicode character classes
Can anyone give me some advice or real-world examples?
Does perlmonks.org use taint mode and how does it untaint the
Seekers of Perl Wisdom "Your question" input?