OK, not having anything more at my disposal than CGI.pm, I'm planning on using these two functions to clean input data, and escape javascript output.

I've set the functions up to accept/return either a single string or an array of strings

Is there a faster way to do this? Am I setting myself up for problems?


update: replaced '#' with '&' in test example
use strict; use warnings; # strip any non-safe URL characters # Note: This is not Data validation! Other # code must verify/edit expected results sub SafeURL { my @args = @_; local $_; foreach (@args) { s/[^\w\d.\@-]//gi if defined; } return wantarray ? @args : pop @args; } # Note: escape html covered by CGI escapeHTML() # escape any non-safe javascript characters sub EscapeJavaScript { my @args = @_; local $_; foreach (@args) { s/([^\w\d.\@-])/uc sprintf("%%%02x",ord($1))/egi if defined; } return wantarray ? @args : pop @args; } ##################### # test subs my @array = qw( blah@&blah.blah/<test> lalalalal12340as-rqweousn //hokey/pokey ); foreach (@array) { my $result1 = SafeURL($_); my $result2 = EscapeJavaScript($_); print "string: $_\n SafeURL: $result1\n EscapeJavaScript: $resul +t2\n"; } print "SafeURL array test: " . join(', ', SafeURL(@array)) . "\n"; print "EscapeJavaScript array test: " . join(', ', EscapeJavaScript(@a +rray)) . "\n";

Replies are listed 'Best First'.
Re: CGI param cleansing
by merlyn (Sage) on Jun 02, 2006 at 19:36 UTC
    I'm confused at the purpose of this code. There's nothing inherently dangerous in any character you can get from a browser via the param subroutine/method. Why do you think you need to "cleanse" them?

    And on the output, escapeHTML should take care of any hand generated items, and the HTML generation subroutines should take care of the rest.

    What exactly is it that you think you need to "clean"?

    As an example, suppose I have a filename in $dangerous that could contain any possible character in the Unix pathname, and I want to both show its name, and generate a link to it. All I have to do is this:

    use CGI qw(a escapeHTML); # amongst other things ... print a({-href => $dangerous}, escapeHTML($dangerous));
    No extra code required.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      You are basically right, however I don't think this line is correct.

      print a({-href => $dangerous}, escapeHTML($dangerous));
      in this case, you have to uri-escape the filename except for the slashes. Suppose for example that the filename is "a?b<c". Than the above example would print <a href="a?b&lt;c">a?b&lt;c</a>. When the viewer clicks on the link, the browser will html-unescape the attribute, and load a?b<c prepended with the current base url. The web server would however interpret this as loading the file a with the GET parameter being b<c. The code should have instead printed <a href="a%3Fb%3Cc">a?b&lt;c</a>.
      Sorry, I was being vague. Some how abstracting the function from what I was using for has made it less useful.

      Where I use SafeURL() is for re-using input data as values in URL links as these values (as I've been using them) generally tend to be nothing more than simple text/email strings.

      For example mycgi.pl was called as
      http://mycgi.pl?sect=test
      I would use SafeURL() on the value of 'sect' and use it to create new dynamic links
      http://mycgi.pl?sect=test;page=22
      So I guess the purpose of SafeURL() is to make the data safe to feed back into a new URL, but that appears to be specific to my use, and probably not of much use beyond.

      Seemed like a good idea at the time :)

      EscapeJavaScript() still has it's uses, unless CGI provides an equivalent function?