in reply to Faster utf8 escaping.
Another thing to bear in mind is that you can often get a long way with core modules that almost do what you want.
For example, you were using Unicode::Escape because it promised to conveniently turn non-ASCII characters into Javascript escape sequences. And this made sense, because that's what you wanted to end up with. Did you look at the core Encode module, though? There's at least one way to use that to solve your problem, and in my tests -- using your test cases -- it comes out about 35% faster than your hand-rolled version, while also providing all the extra stuff Unicode::Escape does like handling non-UTF-8 encodings or invalid UTF-8.
The key thing is the CHECK parameter, which can be used to handle characters that don't fit in the destination encoding. Set your destination encoding to ASCII, and all the characters you want to escape will be handled by CHECK. Outputting Javascript escape sequences isn't one of the options Encode provides, but it does allow you to output other sorts of escape sequence. You can then translate those into the Javascript syntax in an efficient single pass:
sub escape_with_encode { return $_[0] unless $_[0] =~ /[\x80-\xff]/; my $s = shift; Encode::from_to($s, "utf-8", "ascii", Encode::FB_XMLCREF); $s =~ s/&#x(....);/\\u$1/g; return $s; }
Note that I've retained your shortcut at the start of the sub -- it still provides a significant speed boost.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Faster utf8 escaping.
by ikegami (Patriarch) on Apr 07, 2009 at 22:20 UTC | |
|
Re^2: Faster utf8 escaping.
by kyle (Abbot) on Apr 08, 2009 at 01:05 UTC |