Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
I have a question where I have a sub in my Perl code that does some checks on users input. Someone have updated my code to handle uninitialized values by adding an else at the end of the subroutine, I don't like the way it is done. Can some one tell me if this is a good way of doing it and if I should remove this code at the end and do the check another way.
I also think, if this is code for security, knowing that I have a possibility that could show a open door for problems having an else in there
Here is the code
sub cleanup { if ( $_[0] ) { $_[0]=~s/<//g; $_[0]=~s/>//g; $_[0]=~s/;//g; $_[0]=~s/\(//g; $_[0]=~s/\)//g; $_[0]=~s/\"//g; $_[0]=~s/\'//g; $_[0]=~s/\?//g; $_[0]=~s/script//g; $_[0]=~s/\///g; $_[0]=~s/>//g; $_[0]=~s/\*//g; return $_[0]; } else { # I don't think this is the right way of doing it! return ''; } }

Thanks!

Replies are listed 'Best First'.
Re: Return Question in Subroutine
by eric256 (Parson) on Apr 28, 2006 at 14:47 UTC

    I don't know if you are aware but that code is actualy modifying the string in place AND returning it. Since you are using $_[0] (which is an alias) and changes made to it are made to the string passed to it. Sometimes this is expected, and sometimes it isn't. Sense you are using it on param i don't think it matters, but it might lead to some unexpected results. Below is the code I would use assuming that the codes job is to process an incoming param and make ti usabel. To me that cleanup process should also invovled hiding the fact that it was undef.

    use strict; use warnings; my $str = "<some> text with';*()/? nasty chars. scripted is ok, but no +t script"; sub cleanup { my $p = shift; return '' unless defined $p; $p =~ tr|<>;()"'?/*||d; $p =~ s/\bscript\b//g; return $p; } sub cleanup2 { $_[0] =~ tr|<>;()"'?/*||d; $_[0] =~ s/\bscript\b//g; return $_[0]; } print "\$str is: $str\n"; print "Cleaned up: " . cleanup($str) . "\n"; print "\$str is: $str\n"; print "Cleaned up: " . cleanup2($str) ."\n"; print "\$str is: $str\n";

    This outputs:

    $str is: <some> text with';*()/? nasty chars. scripted is ok, but not +script Cleaned up: some text with nasty chars. scripted is ok, but not $str is: <some> text with';*()/? nasty chars. scripted is ok, but not +script Cleaned up: some text with nasty chars. scripted is ok, but not $str is: some text with nasty chars. scripted is ok, but not

    Notice how the last print of $str is already cleaned, even though we never said $str = cleanup($str); . While this could be the desired behaviour it could also be very confusing for anyone using your code.


    ___________
    Eric Hodges
Re: Return Question in Subroutine
by philcrow (Priest) on Apr 28, 2006 at 13:27 UTC
    With apologies to Dr. Conway, I might write:
    sub cleanup { return unless defined $_[0]; # remainder of code with no if around it. }
    This gives back the undef the caller gave you (but in the proper context). As pointed out in earlier comments, that may not be what you want. To me this is sensible. It takes this routine out of the equation, leaving the caller to repair its own undefs.

    I might also name the variable something. Some people would do something to make $_[0] into $_, so it doesn't have to appear on each line (but I'm not one of those people).

    Phil

      So this is how it should be:
      sub cleanup { return unless defined $_[0]; $_[0]=~s/\'/\\\'/g; $_[0]=~s/<//g; $_[0]=~s/>//g; $_[0]=~s/;//g; $_[0]=~s/\(//g; $_[0]=~s/\)//g; $_[0]=~s/\"//g; $_[0]=~s/\'//g; $_[0]=~s/\?//g; $_[0]=~s/script//g; $_[0]=~s/\///g; $_[0]=~s/>//g; }
Re: Return Question in Subroutine
by GrandFather (Saint) on Apr 28, 2006 at 12:57 UTC

    You need to consider what the alternatives are and how the calling code might handle them.

    One way to handle it is to die. The calling code doesn't need to do anything special, but it is very likely that you don't want to kill the program at that point. You can set up a handler so that the die is caught and managed in some fashion, but that is way outside of anything we can advise on without knowing a lot more about what your code is doing.

    You could return undef which can be detected as a special case and handled by the calling code. If it is not handled, but you have warnings turned on (you do use warnings; don't you?) then chances are the program will issue warnings when the returned value is undef and subsequently used.

    Or you can use the current code which again can be detected as a special case by the calling code, but will not generate warnings in most code.

    However, without knowing what the calling code is doing and how it might handle errors, it is rather hard to give a good definitive answer.


    DWIM is Perl's answer to Gödel
      Hi,
      I do have warning, strict all that turned on, but someone here don't like to see in the error logs the use of unnitialized errors on it, this sub checks the input from a form like this, simply:
      my $test = filter(param( 'name' ));

      I still think that a sub like that shouldn't offer anymore options other than check for bad characters trying to make into the application.

        You can probably clean up your cleanup code a little like this:

        use strict; use warnings; my $str = "<some> text with';*()/? nasty chars. scripted should be ok, + but not script"; print cleanup($str); sub cleanup { $_[0]=~tr|<>;()"'?/*||d; $_[0]=~s/\bscript\b//g; return $_[0]; }

        Prints:

        some text with nasty chars. scripted should be ok, but not

        DWIM is Perl's answer to Gödel
Re: Return Question in Subroutine
by Herkum (Parson) on Apr 28, 2006 at 13:25 UTC
    When you do your return, you should not return an empty string. It works as long as you return it as just a string which gets check but lets suppose this,
    my @result = $object->cleanup; if (@result) { print "Got some results back '@result'\n" } __END__ # # You will always get this back if you do return ''; # Got some results back ''
    You always return undef for nothing and your code will work in the way you think and it is easier to read. Just replace that last bit with this.
    return $_[0]; } return;
      That is a very good point "you should not return an empty string", but isn't it this way of not having the log file printing the initialized value error?
Re: Return Question in Subroutine
by GrandFather (Saint) on Apr 28, 2006 at 13:24 UTC

    Consider too what happens with cleanup ("0");.


    DWIM is Perl's answer to Gödel
Re: Return Question in Subroutine
by thundergnat (Deacon) on Apr 28, 2006 at 14:40 UTC

    Somewhat OT; in general, I find it easier to specify which characters to allow, rather than trying to anticipate what might be used against me.

    print cleanup(q/!s@#cr%ipt^&script?M:r;"._ \'W\\][iz][{}=a=+-r-d`~\''/ +); sub cleanup { my $string = shift; if ( defined $string ){ $string =~ s/[^\p{Alnum}., ]//g; $string =~ s/script//g; } return $string; }
Re: Return Question in Subroutine
by nimdokk (Vicar) on Apr 28, 2006 at 13:00 UTC
    You probably don't need the "else" in there, simply return at the end. If your "if-statement" did anything, the return in that block would happen first and you would never get to the second return. If nothing happens, your code would simply skip down to the final return and go back to the main program. I don't think it hurts anything to have it in there, and might be useful in the future if you wanted the routine to do something different if $_[0] was undefined. You might want to ask whomever added it to find out their reasoning for putting the else in there.
    A reply falls below the community's threshold of quality. You may see it by logging in.