in reply to User-Defined Case Mappings

I'm curious as to how this handles multiple sections of the UTF space simultaneously, but nevermind that ;-)

First off, your print statement is after your return statement. It'll never execute under any circumstances.

Second, your $tim string isn't UTF8, so it's moot. Try decoding it into utf8 using the Encode module.

This is what I got to work:

#!/usr/bin/perl use strict; use Encode; sub ToUpper { print "Here we are\n"; return<<END; 0061\t0063\t0041 END } #my $tim = "abcdef"; my $tim = Encode::decode('utf8',"abcdef"); my $t2 = uc($tim); print "[$t2]\n";
Good luck,

Replies are listed 'Best First'.
Re^2: User-Defined Case Mappings
by graff (Chancellor) on Feb 24, 2009 at 02:58 UTC
    I'm curious as to how this handles multiple sections of the UTF space simultaneously, but nevermind that ;-)

    Well, it doesn't, of course. The working code that you posted essentially disables all lower-to-upper case conversions except for the first three ascii lower-case letters. Here's a version that handles a couple different ranges (warning to potential users: STDOUT includes utf8 wide characters):

    #!/usr/bin/perl use strict; use warnings; binmode STDOUT,":utf8"; my $tim = "abcdef \x{ff41}\x{ff42}\x{ff43}\x{ff44}\x{ff45}\x{ff46}"; print "main::uc( $tim ) => ", uc($tim), "\n"; sub ToUpper { return <<END; 0061\t0063\t0041 ff41\tff43\tff21 END }
    But the description of "user-defined case mappings" in the perlunicode man page seems to be lacking something, IMO -- to wit: why would anyone want this? It does not seem to provide the same sort of usefulness that you get with user-defined character classes (described in the previous section of the man page).

    I tried to see if I could make different packages with different case mappings, and it didn't work as hoped for -- in fact, it appears that the first package to define the "ToUpper" and other case-relation functions will set the case relations immutably for the rest of the script.

    Here's a test, which I tried two different ways, once calling the two package subs in the order shown, then in the other order. The second sub call always gives the same result as the first call (i.e. both calls always use the mapping created by the first call):

    I have to admit, I don't see the point of this feature, except to make up some really wicked obfu.

    (updated to add readmore tags)

Re^2: User-Defined Case Mappings
by timgreenwood (Initiate) on Feb 24, 2009 at 16:48 UTC
    Thank you Tanktalus - my first visit to the monastery has been very successful. I should have realized the problem having used the Encode module before. I do still see one (easily avoidable) issue in that contrary to the description in http://perldoc.perl.org/Encode.html decode_utf8(string) is not working as a synonym (in this case) for decode('utf8',string). I am using perl, v5.8.5 built for x86_64-linux-thread-multi - could this be an implementation problem? This is shown in the snippet below.
    #!/usr/bin/perl use strict; use Encode; sub ToUpper { return<<END; 0061\t0063\t0041 END } # Below fails my $tim = decode_utf8("abcdef"); # But this one works #my $tim = decode('utf8',"abcdef"); print uc($tim),"\n";
    I will respond to the other questions separately.