in reply to Re^2: Memory Leak with XS but not pure C
in thread Memory Leak with XS but not pure C

If you were feeding the 'uc' operator a string of utf8 bytes from your editor which perl had not been informed was intended as unicode, then perl would apply ascii uppercasing rules to that string of bytes. Now that you have the "use utf8" in your file, I think you'll find that 'uc' works properly on that string. But, you'll also find that perl warns you if you try to print that string, because in the default configuration the output streams expect bytes as input. You can either use binmode(STDOUT, 'encoding(UTF-8)') to declare that you intend to always write unicode to the file handle, or remember to encode the string before printing.

Full unicode support exists in perl, but yeah it's kind of a learning curve to find it :-(   But that's the price we pay for full multi-decade back-compat.

Replies are listed 'Best First'.
Re^4: Memory Leak with XS but not pure C
by FrankFooty (Novice) on Mar 30, 2025 at 07:31 UTC

    yeah your are right

    . This will be part of a bigger XS thing. Is there a macro I can use for uppercasing?

      Actually I ran into this problem with my Tree::RB::XS module when I wanted to case-fold the keys. The 'uc' operator doesn't have a clean alternative C API available. There are API calls for single characters like 'toUPPER_utf8' but I didn't dig enough to find out if there's a robust way to call this in a loop for all the different versions of perl. The implementation of the uc operator (grep for "pp_uc" in pp.c) has a bunch of ifdef conditionals which have probably changed a lot over the years.

      Since I want to support back to 5.8, I decided to just call out to the perl function with call_pv("CORE::fc", G_SCALAR);. But, as the nearby comments mention, before perl 5.16 that wasn't a function so I needed to wrap the op with a function as sub _fc_impl { lc shift } and then call that.

      Since calling perl functions is a decent bit of overhead, if you need this to run in a hot code path you might still be better off with your external unicode library. Or, if you want to avoid that dependency and stick to recent versions of perl you could just copy/paste most of the pp_uc implementation into your own function and call that (but careful with copyrights there).

      And... um... if you get a reasonably robust version made with the perl API, I'd love to improve the performance of Tree::RB::XS :-)

        OK seems there is no simple way

        cheers!