eol has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow monks (novice though I be),

Was attempting a simple task but had highly unexpected results. Confused as to where the error exactly is. Before I start with the problem, let me state that before I changed to use utf8; and all my POSIX regex to \p{unicode} it worked just fine. The ONLY changes I made was use utf8; and the unicode regex replacing POSIX regex's.

Also ... the input file is:

create\tn\tbob\tjim\t1s\tfoo 123 bar\t314.123.3456

use 5.6.0; use warnings; use strict; use utf8; my %user; while (<>){ /^ (\p{L}*)\p{Cc} # Action to be performed (\p{L}*)\p{Cc} # Need mailbox created? (\p{L}*)\p{Cc} # Last Name (\p{L}*)\p{Cc} # First Name ([\p{L}&&\p{N}]*)\p{Cc} # Rank ([\p{L}&&\p{N}\p{P}&&\p{Zs}]*)\p{Cc} # Unit (\(?31\p{N}\)?[-. ]?\p{N}{3}[-. ]?\p{N}{4}) #DSN Number $/x; %user = ( action => lc($1), mail => lc($2), lname => ucfirst(lc($3)), fname => ucfirst(lc($4)), rank => uc($5), unit => $6, phone => $7, user => ucfirst(lc($3)).uc(substr($4,0,1)) ); } foreach my $key (keys %user) { print "$key $user{$key}\n"; } __END__

This code outputs:

fname Jim
unit foo 123 bar
mail n
user 00eJ
phone 314.123.3456
lname Bob
rank 1ST
action create

Noticed the strangeness with user=> ... it doesn't match lname=> ... instead it is 00e

Now I have been discussing this on a mailing list to no avail and it was suggested I try using the following snippet in the above code:
-user => ucfirst(lc($3)).uc(substr($4,0,1)) +user => "@{[ ucfirst (lc $3) . uc (substr $4, 0, 1) ]}"

This had the exact same output or:

fname Jim
unit foo 123 bar
mail n
user 00eJ
phone 314.123.3456
lname Bob
rank 1ST
action create

It was then suggested that I try the following code:

use 5.6.0; use warnings; use strict; use utf8; my %user; while (<>){ /^ (\p{L}*)\p{Cc} # Action to be performed (\p{L}*)\p{Cc} # Need mailbox created? (\p{L}*)\p{Cc} # Last Name (\p{L}*)\p{Cc} # First Name ([\p{L}&&\p{N}]*)\p{Cc} # Rank ([\p{L}&&\p{N}\p{P}&&\p{Zs}]*)\p{Cc} # Unit (\(?31\p{N}\)?[-. ]?\p{N}{3}[-. ]?\p{N}{4}) #DSN Number $/x; print "1=$1 2=$2 3=$3 4=$4 5=$5 6=$6 7=$7\n"; my $user = $3; print "user=$user\n"; $user = lc $3; print "user=$user\n"; $user = ucfirst $user; print "user=$user\n"; my $tmp .= uc (substr $4, 0, 1); print "tmp=$tmp\n"; $user = "$user$tmp"; print "user=$user\n"; %user = ( action => lc $1, mail => lc $2, lname => ucfirst(lc $3), fname => ucfirst(lc $4), rank => uc $5, unit => $6, phone => $7, user => @{[ucfirst(lc $3).uc(substr $4,0,1)]} ); } foreach my $key (keys %user) { print "$key $user{$key}\n"; } __END__

Output:

1=create 2=n 3=bob 4=jim 5=1st 6=foo 123 bar 7=314.123.3456
user=bob
user=00e
user=00e
tmp=J
user=00eJ
fname Jim
unit foo 123 bar
mail n
user BobJ
phone 314.123.3456
lname Bob
rank 1ST
action create

Now this honestly throws me even more. The function is now returning the proper value but the intermediate values checks are completly wrong (like before).

While yes I could either dynamically create %user OR add $user{$user}= later on after I statically assign the initial values ... I am curious why such a simple thing as I am trying is creating such an odd output.

Any ideas or help on this?

Thanks,

Eöl

Replies are listed 'Best First'.
Re: Odd issue with Perl 5.6, l(u)c, and unicode
by mirod (Canon) on Sep 14, 2002 at 08:15 UTC

    It looks like the problem lies with using the $n variables, which probably changes the utf-8 flag for them: if you move the user => ucfirst(lc($3)).uc(substr($4,0,1)) line above the ones where you create lname and fname then the user field is created properly but lname isn't.

    The problem also exists in perl 5.6.1 but it is fixed in perl 5.8.0. You should probably use 5.8.0 if you want to work with Unicode characters anyway: unicode regexps and hash keys are only documented as working starting with this version.