My application includes a SAX filter which parses a simple markup language into XML elements and which is being executed as part of a XML::SAX::Machines Pipeline within a mod_perl2 handler.

I'm finding that the characters method of my filter emits characters for only two HTTP requests after Apache is started and, for any subsequent HTTP requests, it does not emit characters, although it does continue to emit elements correctly.

I checked the logic of my filter quite carefully and when everything seemed fine, I tried checking to make sure that XML::SAX::Base was receiving the character data my filter was emitting. I did this by altering XML::SAX::Base's characters method thus:

sub characters { my $self = shift; print "\nXML::SAX::Base::characters Received DATA: |" . $_[0]->{Da +ta} . "|\n"; if (defined $self->{Methods}->{'characters'}) { $self->{Methods}->{'characters'}->(@_); } else { my $method; my $callbacks; ...

i.e., I inserted that print statement.

I've found that, when characters are successfully emitted, the print statement gets executed twice. But when the filter stops emitting characters, that statement gets executed only once.

The filter's characters implementation parses the supplied character data, looks for instances of the simple markup language elements and emits chunks of the original character data along with newly generated XML elements.

The filter overrides XML::SAX::Base's start_element method like this:

sub start_element { my ($self, $element) = @_; $self->{parsing_markup} = allow_markup($element->{Name}); $self->SUPER::start_element($element); }

In which allow_markup is a function which determines whether a particular element in the source XML is one for whose content this simple markup language should be applied.

There is an implementation of characters like this:

sub characters { my ($self, $chars) = @_; if ($self->{parsing_markup}) { $self->parse_markup($chars->{Data}); } else { $self->SUPER::characters({Data => $chars->{Data}}); } }

Which sends the character data to parse_markup or just hands it on to XML::SAX::Base's characters method.

The parse_markup method is quite complicated, but its functioning boils down to a mixture of $self->SUPER::start_element, $self->SUPER::end_element, and $self->SUPER::characters calls. The start_element and end_element calls are very likely to be correct as I always get the appropriate tags in the output. But there could be something going awry with the characters calls as this is where the data is going missing.

The call to $self->SUPER::characters looks like this:

my $c = {Data => substr $chars, $from, $upto - $from}; unless ($upto - $from <= 0) { print "\n=> calling SUPER::characters +with " . Dumper($c) . "\n"; } $self->SUPER::characters($c) unless ($upto - $from <= 0);

Which includes some more debugging output, that conditional print call. This output is always as I would expect.

I'm fairly sure that this must have something to do with Apache or mod_perl. But I'm now at a loss as to how to debug further. Any suggestions?

Perl: v5.14.2; mod_perl: 2.0.5; Apache: 2.2.22; XML::SAX::Base: 1.07; all installed from Debian pacakges from the unstable archive.


In reply to SAX filter in mod_perl by ironchicken

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.