Re^3: Hex regex fails in subroutine

If the data is always UTF-8 encoded, it might save some effort to decode that first and then look for the problem characters?

BTW, you can do all this in a single pass, if performance matters.

sub convert_to_html_entities {
   my $str= shift;
   utf8::decode($str);
   $str =~ s/[\x{201A}-\x{2122}]/ '&#'.ord($&).';' /ger;
}
[download]

You could even just wholesale replace all non-ascii characters to completely sidestep the encoding problem:

sub convert_nonascii_to_html_entities {
   my $str= shift;
   utf8::decode($str);
   $str =~ s/[^\x20-\x7E]/ '&#'.ord($&).';' /ger;
}
[download]

Comment on Re^3: Hex regex fails in subroutine Select or Download Code

Replies are listed 'Best First'.
Re^4: Hex regex fails in subroutine by AnomalousMonk (Archbishop) on Sep 30, 2023 at 13:40 UTC
See also haukex's article on dynamic regex alternations. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l]
Re^5: Hex regex fails in subroutine by NERDVANA (Priest) on Sep 30, 2023 at 22:50 UTC
Definitely a useful technique, but a single set of characters should perform much faster than an alternation list. Of course you could use that technique to build the set of characters.	[reply]