harsha.reddy has asked for the wisdom of the Perl Monks concerning the following question:

I have a small perl testing application which glups XML data. All these days I was using XML::Mini Parser, since the XML::Mini has DOM kind of API.

Now I have decided to use SAX based parser (XML::Parser::PerlSAX).
I see in SAX approach the data is collected via call back functions.

This call back thing has made me to re-design stuff, that is call all the rest of the logic from these call back functions, which I don;t want to do.

Instead, I want to read the XML bit by bit and parse it. So I decided to open() the file and read() it.
From the XML::Parser::PerlSAX documentation I found that this module has a method named:

XML::Parser::PerlSAX->location() which returns:

BytePosition The current byte position of the parse.
I am very much intrested in this attribute "BytePosition", I can say, my rest of the logic is depended on this "BytePosition". But unfortunately I din;t find direct method to consume this XML::Parser::PerlSAX->location() function.
The questions I have is:

1. Are there any internel hooks that can return the "BytePosition - The current byte position of the parse."
2. How to lazy evaluate call-back routines of XML::Parser::PerlSAX to fetch limited data on request?

  • Comment on XML-Parser-PerlSAX how to get the BytePosition value

Replies are listed 'Best First'.
Re: XML-Parser-PerlSAX how to get the BytePosition value
by Anonymous Monk on Apr 01, 2008 at 07:24 UTC
    I don't understand question 1, location is a method.
    http://search.cpan.org/src/KMACLEOD/libxml-perl-0.08/lib/XML/Parser/PerlSAX.pm
    sub location { my $self = shift; my $expat = $self->{Expat}; my @properties = ( ColumnNumber => $expat->current_column, LineNumber => $expat->current_line, BytePosition => $expat->current_byte, Base => $expat->base ); # FIXME these locations change while parsing external entities push (@properties, PublicId => $self->{Source}{PublicId}) if (defined $self->{Source}{PublicId}); push (@properties, SystemId => $self->{Source}{SystemId}) if (defined $self->{Source}{SystemId}); return { @properties }; }

      yes I saw that piece of code before posting the above message.
      in the listing number:

      1. Are there any internel hooks that can return the "BytePosition - The current byte position of the parse."

      what I meant is:

      What ever the data processing that I make is in call back routines. Where as the function XML::Parser::PerlSAX->location() is not a part of callback routine, instead it is a XML::Parser::PerlSAX built in function.

      AFIAK, I can't call this function (XML::Parser::PerlSAX->location) inside any of the call back routine. So only time I call this function is when I have finished parsing my entire XML doc. (correct me if I am wrong.)

        1. create sub start_document in your handler package. It can be empty.
        2. create sub set_document_locator.
        sub new { my $proto = shift; my $class = ref($proto) || $proto; my $self = {}; return bless $self, $class; } sub start_document { # Empty. But MUST exist. } sub set_document_locator { my ($self, $params) = @_; $self->{'_parser'} = $params->{'Locator'}; } sub start_element { my $self = shift; print Dumper($self->{'_parser'}->location()); }