I'm curious as to how this handles multiple sections of the UTF space simultaneously, but nevermind that ;-)

Well, it doesn't, of course. The working code that you posted essentially disables all lower-to-upper case conversions except for the first three ascii lower-case letters. Here's a version that handles a couple different ranges (warning to potential users: STDOUT includes utf8 wide characters):

#!/usr/bin/perl use strict; use warnings; binmode STDOUT,":utf8"; my $tim = "abcdef \x{ff41}\x{ff42}\x{ff43}\x{ff44}\x{ff45}\x{ff46}"; print "main::uc( $tim ) => ", uc($tim), "\n"; sub ToUpper { return <<END; 0061\t0063\t0041 ff41\tff43\tff21 END }
But the description of "user-defined case mappings" in the perlunicode man page seems to be lacking something, IMO -- to wit: why would anyone want this? It does not seem to provide the same sort of usefulness that you get with user-defined character classes (described in the previous section of the man page).

I tried to see if I could make different packages with different case mappings, and it didn't work as hoped for -- in fact, it appears that the first package to define the "ToUpper" and other case-relation functions will set the case relations immutably for the rest of the script.

Here's a test, which I tried two different ways, once calling the two package subs in the order shown, then in the other order. The second sub call always gives the same result as the first call (i.e. both calls always use the mapping created by the first call):

#!/usr/bin/perl use strict; use warnings; binmode STDOUT,":utf8"; my $tim = "abcdef \x{ff41}\x{ff42}\x{ff43}\x{ff44}\x{ff45}\x{ff46}"; Foo1Case::foo( $tim ); Foo2Case::foo( $tim ); package Foo1Case; sub ToUpper { return <<END; 0061\t0063\t0041 ff41\tff43\tff21 END } sub foo { my $str = shift; print "foo2::uc( $str ) => ", uc($str), "\n"; } package Foo2Case; sub ToUpper { return <<END; 0061\t0063\t0044 ff41\tff43\tff24 END } sub foo { my $str = shift; print "foo1::uc( $str ) => ", uc($str), "\n"; }

I have to admit, I don't see the point of this feature, except to make up some really wicked obfu.

(updated to add readmore tags)


In reply to Re^2: User-Defined Case Mappings by graff
in thread User-Defined Case Mappings by timgreenwood

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.