rashley has asked for the wisdom of the Perl Monks concerning the following question:

Once upon a time I wrote a very simple sub to filter out non-printable characters on web-form input:
sub filterCharacters { my $text = shift; $text =~ s/[\000-\037]/ /g; $text =~ s/[\177-\777]/ /g; $text =~ s/\s+/ /g; return $text; }
Then one day another developer came along and changed this sub to allow a character he thought was line-feed, but was actually Backspace (he was looking at the decimal value instead of octal):
sub filterCharacters { my $text = shift; $text =~ s/[\000-\009]/ /g; $text =~ s/[\011-\037]/ /g; $text =~ s/[\177-\777]/ /g; $text =~ s/\s+/ /g; return $text; }
So here's the weird part, this manifested itself as each instance of the character '9' getting changed to a space.

We've fixed the problem, but I can't for the life of me figure out how allowing Backspace characters resulted in the 9's getting whacked?

Oh wise Monks, for the sake of my sanity and education, please enlighten me! Thanks.

Replies are listed 'Best First'.
Re: Puzzler - filtering characters
by Joost (Canon) on Oct 22, 2007 at 18:50 UTC
Re: Puzzler - filtering characters
by FunkyMonk (Bishop) on Oct 22, 2007 at 19:31 UTC
    Linefeed is 10 in decimal => 12 in octal, and you can combine three of your substitutes into a single character class:

    $text =~ s/[\000-\011\013-\037\177-\377]/ /g;

    Or, using a POSIX character class, the much more readable

    s/[^[:print:]\n]/ /g;

    See perlre for a full list of the POSIX character classes

Re: Puzzler - filtering characters
by FunkyMonk (Bishop) on Oct 22, 2007 at 18:42 UTC
    9 isn't an octal digit!

      I realize that. So it just used the ASCII value?

        You meant

        ... $text =~ s/[\000-\009]/ /g; ... }

        But it does something like

        ... $text =~ s/[\000-\000]/ /g; # does nothing $text =~ s/[9]/ /g; ...

        BTW, maybe it's sometimes much faster to use the tr/\000-\011/ / operator, but that may depend.

        Regards

        mwa

Re: Puzzler - filtering characters
by andyford (Curate) on Oct 22, 2007 at 18:49 UTC

    I looked up an ASCII code table and it says that 008 & 009 are not codes for anything, so your bad character class was actually hitting the number nine. 'Works' for 008 => 8 too,

    Update: Never mind, should have looked up 'octal'.

    non-Perl: Andy Ford