Re^7: FCGI, tied handles and wide characters

is there anything really wrong with the following

It depends what you understand by "really wrong". It will run, but I would not choose to use it in production for these reasons:

The _utf8_on subroutine comes with the caveat: The following API uses parts of Perl's internals in the current implementation. As such, they are efficient but may change in a future release. It would not be good if a future version suddenly broke it.
The subroutine performs no validity checking on its input whatsoever. The first time it is fed non-utf8 input, it will corrupt your data (at best!).
As stated, it won't run under taint mode. That should be some indication to you that it is not suitable for public use.

Have you benchmarked it to see how much faster it really is compared with Encode::decode()? Always benchmark before optimising.

🦛

Comment on Re^7: FCGI, tied handles and wide characters Select or Download Code

Replies are listed 'Best First'.
Re^8: FCGI, tied handles and wide characters by Maelstrom (Beadle) on Sep 21, 2024 at 09:46 UTC
It might've been different with a bigger file but my benchmarking indicate considerably faster than Encode::decode() Rate implicit encode_decode utf_decode utf8_on implicit 34929/s -- -31% -60% -60% encode_decode 50765/s 45% -- -41% -43% utf_decode 86322/s 147% 70% -- -2% utf8_on 88314/s 153% 74% 2% -- It was a pleasant surprise to see `utf::decode` get so close though. Although given `utf::decode` won't protect me from non-utf8 input either I guess the optimal solution is `$line = Encode::decode('UTF-8', $line) unless (utf8::decode($line));`	[reply] [d/l] [select]
Re^9: FCGI, tied handles and wide characters by ikegami (Patriarch) on Sep 21, 2024 at 16:16 UTC
given utf::decode won't protect me from non-utf8 input either It does. It returns false.	[reply]