Wally Hartshorn has asked for the wisdom of the Perl Monks concerning the following question:

If the user enters text with some characters that aren't allowed, I would like to strip out the invalid characters and keep the rest.

I can do that like so. (This is a much-simplified pattern.)

$input =~ s/[^a-zA-Z]//g;

However, it appears that doing the above will not officially untaint $input. Passing it to a DBI operation causes the program to bail out with a "tainted data" error. In order to officially untaint it, I have to capture a match, like so:

$input =~ /([a-zA-Z]+)/; $input = $1;

Unfortunately, that will capture only the text up to the first invalid character, discarding everything after that.

How can I capture all of the valid text?

I could do this:

$input =~ s/[^a-zA-Z]//g; # keep valid stuff ($input) = $input =~ /(.*)/; # now officially untaint it

But obviously that doesn't feel exactly kosher.

Thanks in advance!

Wally Hartshorn

(Plug: Visit JavaJunkies, PerlMonks for Java)

Replies are listed 'Best First'.
Re: Keeping all valid characters when untainting
by Anonymous Monk on Dec 22, 2003 at 17:56 UTC
    $input = join '', $input =~ /([a-zA-Z]+)/g;

    Edited by Chady -- added code tags.

      Posting a question and getting an answer almost immediately is a mixed blessing. On the one hand, I'm glad to have an answer quickly. Other the other hand, I really wish someone had said "gee, that's a tough one". :-)

      Thanks for the quick reply!

      By the way, after correcting the small typo, the correct answer is:

      $input = join '', $input =~ /([a-zA-Z]+)/g;

      Wally Hartshorn

      (Plug: Visit JavaJunkies, PerlMonks for Java)