Mr.Churka has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, I recently put together a bit of code using XML Twig to handle a massive XML document. I ran into an issue where the parser was generating a "Wide character in print" warning. A Google search revealed that this is usually caused by unicode being handled improperly. This lead me to believe that the handlers may have mangled my XML due to unicode encoding (or decoding) errors. To find out, I tacked the ">:utf8" open mode so that everything is explicitly declared. This resulted in the following code
#!/bin/perl use strict; use warnings; use XML::Twig; use Tie::IxHash; my %Items; tie %Items, "Tie::IxHash"; my $twig=XML::Twig->new( twig_handlers => {_all_ => sub {my $Item_master_Ancestory = $_->ancestors; my $element_match = ($_->tag); my $text = ($_->trimmed_text); my $coupled = join( ' - ' => " "x$Item_master_Ancestory, +$element_match,values %{$_->atts},$text); if (!defined $Items{$coupled}){$Items{$coupled}=1} else {$Items{$coupled}++;} my( $t, $elt)= @_; $t->purge; }, } ); $twig->parsefile( '500syncItemMaster.xml'); open(SUMMARY, ">:utf8", ">Item Summary.txt"); my @k = keys %Items; foreach my $k (@k) {print SUMMARY ("$k => $Items{$k}\n");}; + # output the twig close(SUMMARY);
It's not pretty, but it was getting the job done prior to the ">:utf8" addition. Now I'm being informed that "Print() on closed filehandle SUMMARY at my_perl_parser.pl line 30. I tried deleting the explicit close on line 30 with no effect. Is the > prior to Item Summary no longer telling perl to create a new file to write too?

Replies are listed 'Best First'.
Re: Solution to unicode issue creating more problems than it solves
by Corion (Patriarch) on Nov 27, 2007 at 14:30 UTC

    The error likely is on this line:

    open(SUMMARY, ">:utf8", ">Item Summary.txt");

    and Perl would have told you about it, had you used proper error checking:

    my $outfile = ">Item Summary.txt"; open(SUMMARY, ">:utf8", $outfile) or die "Couldn't create '$outfile': $!";

    Of course, the warning that SUMMARY is a closed filehandle is still true, but not because it was closed behind your back but because it was never opened. Also, XML::Twig does not play into the problem at all, neither does UTF-8.

Re: Solution to unicode issue creating more problems than it solves
by ikegami (Patriarch) on Nov 27, 2007 at 14:54 UTC

    I ran into an issue where the parser was generating a "Wide character in print" warning. A Google search revealed that this is usually caused by unicode being handled improperly.

    "Wide character in print" means you are trying to store a string of characters without converting them to bytes first. Files can only contain bytes.

    Is the > prior to Item Summary no longer telling perl to create a new file to write too?

    open(my $fh, "> $file") Old style open(my $fh, '>', $file) Better 5.6 style open(my $fh, '>:utf8', $file) Better 5.6 style with auto encoding

    Always check the result of open for errors. You'll catch 99% of I/O errors.