gman has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,

I am trying to replace ® with html ®

This does not work, $contents =~ s/\®//\®/g;

Thanks in advance

Solution:

my $string = chr(174); my $contents =~ /$x/\®/g;

Thanks to all the replied

Replies are listed 'Best First'.
Re: matching special char
by pc88mxer (Vicar) on May 05, 2008 at 20:29 UTC
    I have a feeling that you are dealing with UTF8 encoded data, but you also could be dealing with code-points, and we need to figure out which is the case.

    When you refer to ®, can you tell if it is stored as two perl characters or just one? That is, if you were to isolate ® into a string variable (say $x), what would print length($x) display?

    Here's an example that illustrates the difference:

    my $x = chr(174); binmode STDOUT, ':utf8'; print "x has length ", length($x), " >>$x<<\n";
    and this emits:
    x has length 1 >>®<<
    So, even though $x has length 1, it looks like it has length 2 when printed out. On the other hand, it also could have length 2:
    my $x = chr(194).chr(174); binmode STDOUT, ':bytes'; print "x has length ", length($x), " >>$x<<\n";
    and this emits:
    x has length 2 >>®<<

    The upshot is that if $x has length 1, your string probably contains Unicode code-points, and you'll likely want to look into using the encode_entities function from the module HTML::Entities. This is a general way to convert code-points to HTML entity references.

    On the other hand, if $x has length 2, then your string probably contains UTF8 encoded characters. You would then likely find it advantageous to convert that UTF8 stream into Unicode code-points using the encode function from the Encode module like this:

    use Encode; my $code_points = encode('utf8', $x);
    The reason you would like to use code-points in your program rather than UTF8 bytes is that perl is much more adept at handling strings when they are stored as code-points.

      Thanks for your reply,

      I tested the string:

      my $x = chr(174); binmode STDOUT, ':utf8'; print "x has length ", length($x), " >>$x<<\n";

      It showed up as one char,

      my $string = chr(174); my $contents =~ /$x/\&reg;/g;

      This results in the proper substitution, I did search for an extended ascii table for the symbol, but somehow missed it. I will be looking up more information on the two solutions you showed.

      Thanks again,

Re: matching special char
by mwah (Hermit) on May 05, 2008 at 19:19 UTC

    Aside from the error pointed out by others already - is the data from some HTML source?

    Maybe it's &#174; or &#xAE;?

    $contents =~ s/\&#174;/\&reg;/g; $contents =~ s/\&#xAE;/\&reg;/g;

    Can you tell us more about the source?

    Regards

    mwa

Re: matching special char
by toolic (Bishop) on May 05, 2008 at 19:09 UTC
    use warnings; use strict; my $contents = 'foo®bar'; $contents =~ s/®/\&req;/g; print "contents=:$contents:\n";

    prints:

    contents=:foo&req;bar:
Re: matching special char
by apl (Monsignor) on May 05, 2008 at 19:09 UTC
    Off the top of my head, you should replace the double slash with a single slash.

    I assume you aren't using use strict; use warning;. If you were, you'd probably get errors on the regexp.