in reply to XML::Twig error reporting

Good question! This is not (yet!) documented, but the expat object, which gives you access to all of the Expat methods, including the current line and column numbers can be accessed through the twig: it is in $t->{twig_parser}. So getting t->{twig_parser}->current_line will give you the current line number. There is one caveat though: twig_handlers are called when an element is completely parsed (so you can process its content), so you will get the position of the closing tag, which is of course enough to locate the element, but might not be the most convenient way to then edit the document. So you might want to "annotate" the document with the position for each tag, or at least each tag in which you are interested.

By the way, XML::Parser::Expat, which calls Expat to actually reading the XML, does not set $. so you can't use it.

So here is a version that properly outputs the line/column number for the opening element. If the line/number for the closing element is OK then you don't need the start_tag_handler and you can get the position in the elt handler, and if you are concerned about size you might want to limit calls to the start_tag_handler to those elements that you check late.

#!/bin/perl -w use strict; use XML::Twig; my $t= new XML::Twig( # called for all opening tags start_tag_handlers => { _all_ => \&store_position }, # called for each closing elt tag twig_handlers => { elt => \&elt}); $t->parse( \*DATA); sub store_position { my( $t, $elt)= @_; my $line = $t->{twig_parser}->current_line; # $t->{tw +ig_parser} is the expat object my $column = $t->{twig_parser}->current_column; $elt->{my_atts}= { line => $line, column => $column }; # crude b +ut works } sub elt { my( $t, $elt)= @_; if( my $error= $elt->att( 'error')) { my $line = $elt->{my_atts}->{line}; my $column = $elt->{my_atts}->{column}; print STDERR "error $error at $line:$column\n"; } } __DATA__ <doc> <elt>this one is OK</elt> <elt error="foo">not this one though</elt> <elt>OK</elt> <elt error="bar">here is a bar error</elt> </doc>

By the way, I have a question on this last piece of code: in order to store the position information I simply use a new field in the hash (my_atts). This is convenient but hardly robust: what if the object implementation changes to a blessed scalar or a closure? Or if it uses a "my_atts" field? What would be a better way? Inheritance seems difficult, as the elements are created and processed by XML::Twig. Should XML::Twig document a field that can be used for this, both for twigs and for elements?

Replies are listed 'Best First'.
Re: Re: XML::Twig error reporting
by John M. Dlugosz (Monsignor) on Nov 06, 2001 at 21:02 UTC
    Thanks, that's exactly what I need.

    For recording the information in the element, I would use the regular att() feature. E.g. $elt->set_att('#line', $line);

    That follows the example of #PCDATA which uses # for a "special" name used like an identifier.

    A more direct answer to the last question is yes, document an extension mechanism rather than relying on the object's implementation. Simply providing a hashref where users can store their stuff is an sufficient. A fancier way would be to provide a way to manage it so different users don't clobber each other, but convention can do just as well: tell them to use their fully-qualified module name as the start of the key.

    —John

Re: Re: XML::Twig error reporting
by John M. Dlugosz (Monsignor) on Nov 07, 2001 at 00:43 UTC
    A thought: making special things like "#line" stored with attributes, as opposed to some other type of mechanism, means that it will work with all the selection and filtering mechanisms.

    All you need is a switch so printing will skip these "special" attributes, denoted by having illegal names.

    —John

      I have to think about it. The problem I see is that, although this is a usefull and clever trick, it would probably be used quite infrequently, while slowing down every print or sprint... Though as I would limit it to attributes starting with #, it would only cost one substr() per attribute. I think silently removing all illegal attributes is too dangerous for the user, and checking them might get me into Unicode trouble. Now what about elements names starting with #

      BTW if I go this route I might as well add an option to generate the line/column attributes ;--)

        I ended up using a hash member after all:

        sub store_position { my( $t, $elt)= @_; my $line = $t->{twig_parser}->current_line; # $t->{twig_parser} +is the expat object my $column = $t->{twig_parser}->current_column; $elt->{custom_atts}{position}= [$line, $column]; # crude but works }
        The compelling reason was because I can't assume that the attr mechanism would handle anything other than strings, while a custom-data ability would be defined to handle any Perl object.

        I still agree that you should simply document the hash key. But, that also exposes that it =is= a hash, so having get/set custom data members would be better.

        I mentioned before that an advantage to using att() is that the normal selection mechanisms work. Here are two more: I can have an explicit line/col in an XML document, much as the #line directive is used in C/C++ -- it can reference the original document's source location. Second, when I add or transform elements (e.g. with split) the new ones don't have this row/col information. An att() could automatically inherit the value from the parent. A documented get_custom_data(KEY) function could also be programmed to inherit.

        —John