Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

ok, I am trying to replace special characters with their ascii codes. The problem is that if i were to use something like...
sub convert { my $in = shift; return ( '&amp;#' . ord($in) . ';'); } $line =~ s/([\&amp;\'\;\\\"\|\*\?\~\^\(\)\[\]\{\}\$\n\r\<\>])/&amp;con +vert($1)/g;
it corrupts the html in $line. Any ideas? Thanks.

Replies are listed 'Best First'.
(jeffa) Re: preserving html in patterns
by jeffa (Bishop) on Dec 21, 2001 at 06:31 UTC
    A lot of times it appears that the only way to solve a problem is in one direction, but most of the time, there is another way. Instead of trying to substitute some set of characters, say X, match the negation of X:
    use strict; my $line = '$x += 2; $y = $x**2'; $line =~ s/([^\sa-zA-Z0-9])/'&#'.ord($1).';'/ge; # or if you don't want to encode underscores $line =~ s/([^\s\w])/'&#'.ord($1).';'/ge; print "$line\n";
    This does not encode newlines and carriage returns, only spaces. Why you would want to use &#10; or &#13; i don't know, but you can use this to achieve that without encoding spaces:
    s/([^ a-zA-Z0-9])/'&#'.ord($1).';'/ge # a single, literal space

    I also wonder if this tool is available in the CPAN somwhere. But i digress...

    (updated node: got confused about \s and such, sorry)

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    F--F--F--F--F--F--F--F--
    (the triplet paradiddle)
    
Re: preserving html in patterns
by davorg (Chancellor) on Dec 21, 2001 at 14:04 UTC

    Perhaps you should look at the escapeHTML function from CGI.pm.

    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      I'm almost tempted to point him to URI::Escape or CGI.pm's facilities, but there's also String::Escape. I've used all three, with varying results, depending on what exactly he's trying to escape, strings, form elements, or URI structures.
Re: preserving html in patterns
by mrbbking (Hermit) on Dec 21, 2001 at 06:39 UTC

    I like Jeffa's answer, because the substitution pattern is easier to read.

    Alternatively, I think you could get away with just adding the 'e' modifier after your 'g'. That tells s/// to evaluate the right side, rather than use it at face value.