Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Oh great monks, I find I am needful your wisdom. I am trying to use Twig to process an ungainly XML file as a stream of descrete parts. I am using ActivePerl 5.6.1.631 on Win32. I have used variations of the XML tools provided for Perl before for other tasks, but these files are too big for things like XPath, etc. I have loaded the Twig module and am trying to use the twig_roots function, since all I need to do is read data from the tags of interest and then purge the rest of the tree. When I run the code below I get this error message: "Undefined subroutine &main:: called at C:/Perl/site/lib/XML/Parser/Expat.pm line 439." In Expat.pm I see the following (lines 412 through 448):
sub parse { my $self = shift; my $arg = shift; croak "Parse already in progress (Expat)" if $self->{Used}; $self->{Used} = 1; my $parser = $self->{Parser}; my $ioref; my $result = 0; if (defined $arg) { if (ref($arg) and UNIVERSAL::isa($arg, 'IO::Handler')) { $ioref = $arg; } else { eval { no strict 'refs'; $ioref = *{$arg}{IO}; }; } } if (defined($ioref)) { my $delim = $self->{Stream_Delimiter}; my $prev_rs; $prev_rs = ref($ioref)->input_record_separator("\n$delim\n") if defined($delim); $result = ParseStream($parser, $ioref, $delim); ### Line 439 ref($ioref)->input_record_separator($prev_rs) if defined($delim); } else { $result = ParseString($parser, $arg); } $result or croak $self->{ErrorMessage}; }
The only thing I can think is that ParseStream is undefined. A quick search through the Expat, Parser, and Twig modules reveals no other instance of "ParseStream." What could be going on? I have used PPM to "verify --force update" all relevant modules, but to no avail. Here is a snippet of my code:
#!C:/Perl use XML::Parser; use XML::Parser::Expat; use XML::Twig; my $t = XML::Twig->new( twig_roots => { '/invoice/header/accountnumber' => \&print_elt_text } ); $t->parsefile('MyHugeXML.xml'); sub print_elt_text { my( $t, $elt ) = @_; print $elt->text; $t->purge; }
Thank you wise ones. Your wisdom is like ambrosia.

Replies are listed 'Best First'.
Re: XML-Twig
by bart (Canon) on Aug 27, 2002 at 00:49 UTC
    The only thing I can think is that ParseStream is undefined.
    Urm... I don't think so. Let me tell you that I'm just guessing... It looks to me like ParseStream is defined in XML/Parser/Expat.xs, the C bind layer between plain Perl and the C library. At least I can see this in there:
    int XML_ParseStream(parser, ioref, delim) XML_Parser parser SV * ioref SV * delim CODE: ...

    What I think, is that ParseStream() needs a callback for each "event". That means that this function calls some plain Perl subs. And I think that there is where something goes wrong. Perhaps it simply misses a proper callback sub.

    Could you make a small XML file available, that is compatible with your program, and which exhibits the problem? That way, I, and maybe some other people here as well, could play a little with your program.

Re: XML-Twig
by PodMaster (Abbot) on Aug 27, 2002 at 07:49 UTC
    Works perfectly well for me. This is what I get after I run my example (yours modified)
    DON't LEAVE OUT RELEVANT INFORMATION
    
    XML::Twig::VERSION 3.04
    XML::Parser::VERSION 2.31
    XML::Parser::Expat::VERSION 2.31
    
    
     ... EEEEEEEEEEEEEEEEEEEEEEEKKKKK  ...
    
    use XML::Parser; use XML::Parser::Expat; use XML::Twig; $\="\n"; print "DON't LEAVE OUT RELEVANT INFORMATION\n"; print "XML::Twig::VERSION $XML::Twig::VERSION"; print "XML::Parser::VERSION $XML::Parser::VERSION"; print "XML::Parser::Expat::VERSION $XML::Parser::Expat::VERSION\n"; my $t = XML::Twig->new( twig_roots => { '/node/data/field' => \&print_elt_text } ); $t->parse( \*DATA ); sub print_elt_text { my( $t, $elt ) = @_; print $elt->text; $t->purge; } __DATA__ <?xml version="1.0" encoding="ISO-8859-1"?> <node id="193032" title="XML-Twig" > <type id="115"> perlquestion</type> <author id="961"> Anonymous Monk</author> <data> <field name="doctext"> ... EEEEEEEEEEEEEEEEEEEEEEEKKKKK ... </field> </data> </node>
    I suggest removing XML::Parser and XML::Twig completely and reinstalling ( something gone bad).

    ____________________________________________________
    ** The Third rule of perl club is a statement of fact: pod is sexy.