bpphillips has asked for the wisdom of the Perl Monks concerning the following question:
If I change that regex to include a utf8 character in the pattern (which according to the "Important Caveats" of perldoc's perlunicode page makes the regex compiler recognize multi-byte characters), it works:s/([^\n\r\t !\#\$%\'-;=?-~])/$char2entity{$1} || num_entity($1)/ge
In all the tests I've done (perl v5.6.1 and v5.8.1), the first regular expression only ever matches the first byte of the character rather than both bytes. I hate patching stock modules like this because they become very hard to maintain.my $foo = "\x{263A}"; $$ref =~ s/([^\n\r\t !\#\$%\'-;=?-~]|$foo)/$char2entity{$1} || num_ent +ity($1)/ge;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: HTML::Entities and multi-byte characters
by iburrell (Chaplain) on Sep 13, 2004 at 19:48 UTC | |
by bpphillips (Friar) on Sep 13, 2004 at 20:35 UTC | |
by iburrell (Chaplain) on Sep 13, 2004 at 22:07 UTC | |
by bpphillips (Friar) on Sep 14, 2004 at 14:31 UTC |