problem with Unicode::UCD

rubenstein has asked for the wisdom of the Perl Monks concerning the following question:

I can't figure out why this does not work:

use Unicode::UCD qw/charscript charblock/;
my $infile = shift @ARGV;
open (INPUT, "<:utf8", "$infile")    or die "cant open $infile: $!";
$/ = undef;
my $string = <INPUT>;
my @array = split //, $string;
foreach my $uChar (@array) { 
   $uChar = ord($uChar);
   print "DEBUG uChar: $uChar\n"; #works fine (decimal value of each c
+har is printed)
   $c = charblock($uChar);
   print "DEBUG: $c\n";  #produces warning and prints blank
}
[download]

The warning is:
Use of uninitialized value in concatenation (.) or string at freq\points.pl line 34.
I have tested the charblock function with

my $c = charblock(1578);
print "$c\n";
[download]

and this works fine, printing "Arabic" as it should
What am I missing?

Comment on problem with Unicode::UCD Select or Download Code

Replies are listed 'Best First'.
Re: problem with Unicode::UCD by djohnston (Monk) on May 19, 2005 at 22:58 UTC
The source of the undefined value is the call to charblock. The docs for Unicode::UCD in regards to its charblock routine states, "If the argument is not a known character block, 'undef' is returned". A couple of solutions might be: `my $c = charblock($uChar) \|\| $uChar; # default to original value # or my $c = charblock($uChar) or next; # skip iteration` [download] (take note that I know next to nothing about unicode, though)	[reply] [d/l]
Re: problem with Unicode::UCD by ikegami (Patriarch) on May 19, 2005 at 22:52 UTC
I have no idea what the problem is -- then again, I don't know anything about Unicode -- but you can find out which character is giving a problem by adding the following after `$c = charblock($uChar);`: `print "DEBUG: charblock($uChar) returned undef!\n" unless defined $c;` [download]	[reply] [d/l] [select]
Re: problem with Unicode::UCD by scmason (Monk) on May 19, 2005 at 22:39 UTC
Well, the only thing that I can see is that you have 'declared 'your other variables through the use of 'my', as you do in your test of the charblock function. In your main example, I do not see where you have declared $c. Of course, if you are using strict, this would cause a parse error. Maybe declare $c, set it to some default, like the empty string, and try it out. This suggestion is probably a non starter, but it is the first thing that sprang to mind for me. "Never take yourself too seriously, because everyone knows that fat birds dont fly" -FLC	[reply]
Re^2: problem with Unicode::UCD by rubenstein (Novice) on May 19, 2005 at 23:51 UTC
It turns out that the problem was the slurping. When I added `$/ = "\n";` [download] after `my $string = <INPUT>;` [download] things worked fine. I had done my test use of the function before slurping up the actual data. Annoying.	[reply] [d/l] [select]
Re^3: problem with Unicode::UCD by graff (Chancellor) on May 20, 2005 at 02:37 UTC
I haven't looked at the source for Unicode::UCD, but I'm guessing that it's probably doing some file input somewhere. (Not an unreasonable thing.) That's one good reason why slurp mode should really be done like this, especially in code that uses modules, to make sure that undef'ing $/ is localized to just the code block where this is needed: `my $filedata; open( IN, '<', $filename ) or die "whatever. $!"; { local $/; $filedata = <IN>; } close IN;` [download]	[reply] [d/l]
Re^4: problem with Unicode::UCD by holli (Abbot) on May 20, 2005 at 05:07 UTC
Re^4: problem with Unicode::UCD by rubenstein (Novice) on May 20, 2005 at 16:01 UTC