Hello monks,
I recently put together a bit of code using XML Twig to handle a massive XML document. I ran into an issue where the parser was generating a "Wide character in print" warning. A Google search revealed that this is usually caused by unicode being handled improperly. This lead me to believe that the handlers may have mangled my XML due to unicode encoding (or decoding) errors. To find out, I tacked the ">:utf8" open mode so that everything is explicitly declared. This resulted in the following code
#!/bin/perl
use strict;
use warnings;
use XML::Twig;
use Tie::IxHash;
my %Items;
tie %Items, "Tie::IxHash";
my $twig=XML::Twig->new(
twig_handlers =>
{_all_ => sub {my $Item_master_Ancestory = $_->ancestors;
my $element_match = ($_->tag);
my $text = ($_->trimmed_text);
my $coupled = join( ' - ' => " "x$Item_master_Ancestory,
+$element_match,values %{$_->atts},$text);
if (!defined $Items{$coupled}){$Items{$coupled}=1}
else {$Items{$coupled}++;}
my( $t, $elt)= @_;
$t->purge;
},
}
);
$twig->parsefile( '500syncItemMaster.xml');
open(SUMMARY, ">:utf8", ">Item Summary.txt");
my @k = keys %Items;
foreach my $k (@k) {print SUMMARY ("$k => $Items{$k}\n");};
+ # output the twig
close(SUMMARY);
It's not pretty, but it was getting the job done prior to the ">:utf8" addition. Now I'm being informed that "Print() on closed filehandle SUMMARY at my_perl_parser.pl line 30. I tried deleting the explicit close on line 30 with no effect. Is the > prior to Item Summary no longer telling perl to create a new file to write too?
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.