gregaryh has asked for the wisdom of the Perl Monks concerning the following question:

This one is baffling me. I have an XML document I am trying to process with XML::Twig and I am getting an error where I think it should not.

Here is the setup:
the xml I am trying to parse looks like this
<document> <record> <attachment>data</attachment> </record> </document>

my $twig = XML::Twig->new(twig_handlers=>{record => \&process_record, attachment => \&process_attachment});
$twig->parse($xml);

then in the handlers
sub process_attachment() { my ($twig, $attach) = @_; do some data processing .... $attach->purge; } sub process_record() { my ($twig, $record) = @_; do some data processing .... $record->purge; }

This produces an error: Can't locate object method "purge" via package "XML::Twig::Elt" at <line number>

Where line number points to $attach->purge;

but in the examples given in the perldoc: "You can also purge it if you don't need to output it (if you are just extracting some data from the document for example). The handler will be called again once the next relevant element has been parsed."

$attach->flush does seem to work but I don't want to print it, so why doesn't $attach->purge work?

If I use $twig->purge it kills the whole record, not just the attachment.

The problem I am trying to address is that a record in the xml can have several very large attachments. I want to parse each attachment (loading it into an array) and then purge the xml of just the attachment from memory, leaving the rest of the record in tact until it can be processed by its handler.

Replies are listed 'Best First'.
Re: broken twig?
by mirod (Canon) on Sep 13, 2005 at 19:28 UTC

    Are you sure $twig->purge in process_attachment doesn't do what you want? In any case purge should be called on the twig object, not on an element.

    Just in case you need it (you shouldn't from your description of the problem), you can also use the purge_up_to method.

    method

      That is the thing that confuses me. $twig->purge in process_attachment() seems to delete the whole record, not just the attachment. What I want to do is purge only the attachment.

      I also call purge in process_record and it seems to do what is expected there (if I comment out the purge in process_attachment that is) Meaning if I have multiple records, each one is purged in turn after processing
      using purge_up_to($attach) does the same. When process_record is called it has no children so it has nothing to process Am I doing something wrong in the way I handle the attachment?

      here is a more faithful view of the xml:

      <document version="2.2"> <record> <record_id>113082</record_id> <creation_ts>2005-09-08 11:53 MDT</creation_ts> <short_desc>test</short_desc> <delta_ts>2005-09-08 11:58:21 MDT</delta_ts> <attachment> <attachid>47644</attachid> <date>2005-09-08 11:55 MDT</date> <desc>diff</desc> <data>Large amounts of encoded data in excess of 5 MB </data> </attachment> <attachment> .... </attachment> </record> </document>

        The twig_purge in attachment will indeed purge everything before that element, meaning that the record_id, creation_ts, short_desc and delta_ts elements will be purged, but the (now empty) record will be kept. Is this what is causing you problems?

        Maybe what you are looking for is just $attach->delete in the attachment handler, then $twig->purge in the record handler. If Scalar::Util is installed, then using delete will release the memory for the attachment (at least so it can be re-used by the Perl process).

Re: broken twig?
by japhy (Canon) on Sep 13, 2005 at 19:29 UTC
    Have you tried doing $twig->purge_up_to($attach)?

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart