Ovid has asked for the wisdom of the Perl Monks concerning the following question:

I am working on a CGI script where I need to take some user input, clean it, and redisplay it on a Web page. To eliminate a security concern, I am taking dangerous characters and converting them to their character code equivalent (e.g. & = & #38; -- All characters codes in this example have an extra space after the first ampersand to guarantee that they are not converted when displayed). Unfortunately, this winds up with hideous monstrosities like the following:
s/(&|;)/($1 eq "&")?"& #38;":"& #59;"/ge;
This substitutes the appropriate character code for either the ampersand or semicolon as it encounters it in the variable. I have to do both on the same line since the character codes themselves contain ampersands and semicolons and substituting for one would create false positives for the other is the other was substituted for later. While I'm rather proud of that regex (for a newbie), I have to admit that it's not terribly clear. Is there an easier way to scrub incoming data that will be tossed out to a Web page later on?

Replies are listed 'Best First'.
Re: Safer (and cleaner) way to print user-supplied text.
by btrott (Parson) on Jun 08, 2000 at 21:28 UTC
    Would HTML::Entities do what you're asking?
    use HTML::Entities; my $encoded = encode_entities( $input, "&;" );
    The second argument provides the "unsafe" characters that you want encoded. perldoc HTML::Entities.
Re: Safer (and cleaner) way to print user-supplied text.
by cwest (Friar) on Jun 08, 2000 at 21:39 UTC
    Here's an approach without using another module:
    my $bad  = { 
                '&' => '&',
                '|' => '&pipe;', # I don't know this one :-)
                ';' => ';', # This one either
               };
    my $find =  join '', keys %{$bad};
    $string  =~ s/([$find])/$bad->{$1}/g;
    
    Enjoy
    --
    Casey
    
Re: Safer (and cleaner) way to print user-supplied text.
by swiftone (Curate) on Jun 08, 2000 at 21:48 UTC
    This isn't terribly clear, but it has the advantage of easily being able to define new "trouble" characters:
    #assuming input is in $string $string=join('',map( m![<>\\/]! ? "#".char($_).";" : $_, split(//, $st +ring))); # ^^^^^^^ These are trouble characters
Re: Safer (and cleaner) way to print user-supplied text.
by Ovid (Cardinal) on Jun 09, 2000 at 00:13 UTC
    Thanks for all the responses. However, the prize belongs to btrott for choosing door #1. It's a great solution. One of the things that I love about Perl is that if I need help, someone's always there to hold my hand. I love you guys :)
      You really have to hand it to these guys who answered. People post help and use a module and others without which I think is always the best way of learning.

      Modules are great but sometimes you may not have access to them.

      Big Joe
        Yes, I have to agree that you're right and I really didn't give enough credit to the ones who did it "by hand", so to speak. They had some nifty "more than one way" answers.

        "If I heard a voice from heaven say 'Live without loving,'
        I'd beg off. Girls are such exquisite hell." -- Ovid