comment on

Hello fellow monks (novice though I be),

Was attempting a simple task but had highly unexpected results. Confused as to where the error exactly is. Before I start with the problem, let me state that before I changed to use utf8; and all my POSIX regex to \p{unicode} it worked just fine. The ONLY changes I made was use utf8; and the unicode regex replacing POSIX regex's.

Also ... the input file is:

create\tn\tbob\tjim\t1s\tfoo 123 bar\t314.123.3456

use 5.6.0;
use warnings;
use strict;
use utf8;

my %user;

while (<>){
  /^
    (\p{L}*)\p{Cc}    # Action to be performed
    (\p{L}*)\p{Cc}     # Need mailbox created?
    (\p{L}*)\p{Cc}    # Last Name
    (\p{L}*)\p{Cc}    # First Name
    ([\p{L}&&\p{N}]*)\p{Cc}    # Rank
    ([\p{L}&&\p{N}\p{P}&&\p{Zs}]*)\p{Cc}    # Unit
    (\(?31\p{N}\)?[-. ]?\p{N}{3}[-. ]?\p{N}{4})   #DSN Number
  $/x;

  %user = (
          action => lc($1),
          mail => lc($2),
          lname => ucfirst(lc($3)),
          fname => ucfirst(lc($4)),
          rank => uc($5),
          unit => $6,
          phone => $7,
          user => ucfirst(lc($3)).uc(substr($4,0,1))
          );
}

foreach my $key (keys %user) {
  print "$key $user{$key}\n";
}

__END__
[download]

This code outputs:

fname Jim
unit foo 123 bar
mail n
user 00eJ
phone 314.123.3456
lname Bob
rank 1ST
action create

Noticed the strangeness with user=> ... it doesn't match lname=> ... instead it is 00e

Now I have been discussing this on a mailing list to no avail and it was suggested I try using the following snippet in the above code:

-user => ucfirst(lc($3)).uc(substr($4,0,1))
+user => "@{[ ucfirst (lc $3) . uc (substr $4, 0, 1) ]}"
[download]

This had the exact same output or:

fname Jim
unit foo 123 bar
mail n
user 00eJ
phone 314.123.3456
lname Bob
rank 1ST
action create

It was then suggested that I try the following code:

use 5.6.0;
use warnings;
use strict;
use utf8;

my %user;

while (<>){
  /^
    (\p{L}*)\p{Cc}    # Action to be performed
    (\p{L}*)\p{Cc}     # Need mailbox created?
    (\p{L}*)\p{Cc}    # Last Name
    (\p{L}*)\p{Cc}    # First Name
    ([\p{L}&&\p{N}]*)\p{Cc}    # Rank
    ([\p{L}&&\p{N}\p{P}&&\p{Zs}]*)\p{Cc}    # Unit
    (\(?31\p{N}\)?[-. ]?\p{N}{3}[-. ]?\p{N}{4})   #DSN Number
  $/x;
  print "1=$1 2=$2 3=$3 4=$4 5=$5 6=$6 7=$7\n";
  
  my $user = $3;
  print "user=$user\n";
  $user = lc $3;
  print "user=$user\n";
  $user = ucfirst $user;
  print "user=$user\n";
  my $tmp .= uc (substr $4, 0, 1);
  print "tmp=$tmp\n";
  $user = "$user$tmp";
  print "user=$user\n";

  
  %user = (
          action => lc $1,
          mail => lc $2,
          lname => ucfirst(lc $3),
          fname => ucfirst(lc $4),
          rank => uc $5,
          unit => $6,
          phone => $7,
          user => @{[ucfirst(lc $3).uc(substr $4,0,1)]}
          );
}

foreach my $key (keys %user) {
  print "$key $user{$key}\n";
}

__END__
[download]

Output:

1=create 2=n 3=bob 4=jim 5=1st 6=foo 123 bar 7=314.123.3456
user=bob
user=00e
user=00e
tmp=J
user=00eJ
fname Jim
unit foo 123 bar
mail n
user BobJ
phone 314.123.3456
lname Bob
rank 1ST
action create

Now this honestly throws me even more. The function is now returning the proper value but the intermediate values checks are completly wrong (like before).

While yes I could either dynamically create %user OR add $user{$user}= later on after I statically assign the initial values ... I am curious why such a simple thing as I am trying is creating such an odd output.

Any ideas or help on this?

Thanks,

Eöl

In reply to Odd issue with Perl 5.6, l(u)c, and unicode by eol

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.