rubenstein has asked for the wisdom of the Perl Monks concerning the following question:

I can't figure out why this does not work:

use Unicode::UCD qw/charscript charblock/; my $infile = shift @ARGV; open (INPUT, "<:utf8", "$infile") or die "cant open $infile: $!"; $/ = undef; my $string = <INPUT>; my @array = split //, $string; foreach my $uChar (@array) { $uChar = ord($uChar); print "DEBUG uChar: $uChar\n"; #works fine (decimal value of each c +har is printed) $c = charblock($uChar); print "DEBUG: $c\n"; #produces warning and prints blank }
The warning is:
Use of uninitialized value in concatenation (.) or string at freq\points.pl line 34.
I have tested the charblock function with
my $c = charblock(1578); print "$c\n";
and this works fine, printing "Arabic" as it should
What am I missing?

Replies are listed 'Best First'.
Re: problem with Unicode::UCD
by djohnston (Monk) on May 19, 2005 at 22:58 UTC
    The source of the undefined value is the call to charblock. The docs for Unicode::UCD in regards to its charblock routine states, "If the argument is not a known character block, 'undef' is returned". A couple of solutions might be:
    my $c = charblock($uChar) || $uChar; # default to original value # or my $c = charblock($uChar) or next; # skip iteration
    (take note that I know next to nothing about unicode, though)
Re: problem with Unicode::UCD
by ikegami (Patriarch) on May 19, 2005 at 22:52 UTC

    I have no idea what the problem is -- then again, I don't know anything about Unicode -- but you can find out which character is giving a problem by adding the following after $c = charblock($uChar);:

    print "DEBUG: charblock($uChar) returned undef!\n" unless defined $c;
Re: problem with Unicode::UCD
by scmason (Monk) on May 19, 2005 at 22:39 UTC
    Well, the only thing that I can see is that you have 'declared 'your other variables through the use of 'my', as you do in your test of the charblock function.

    In your main example, I do not see where you have declared $c. Of course, if you are using strict, this would cause a parse error. Maybe declare $c, set it to some default, like the empty string, and try it out.

    This suggestion is probably a non starter, but it is the first thing that sprang to mind for me.

    "Never take yourself too seriously, because everyone knows that fat birds dont fly" -FLC
      It turns out that the problem was the slurping.
      When I added
      $/ = "\n";
      after
      my $string = <INPUT>;
      things worked fine. I had done my test use of the function before slurping up the actual data. Annoying.
        I haven't looked at the source for Unicode::UCD, but I'm guessing that it's probably doing some file input somewhere. (Not an unreasonable thing.)

        That's one good reason why slurp mode should really be done like this, especially in code that uses modules, to make sure that undef'ing $/ is localized to just the code block where this is needed:

        my $filedata; open( IN, '<', $filename ) or die "whatever. $!"; { local $/; $filedata = <IN>; } close IN;