blahblah has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I'm trying to get XML::SAX off the ground, but I'm not doing so well.
I'm doing:
#!/usr/bin/perl -Tw use strict; use CGI::Carp qw(fatalsToBrowser); use XML::SAX::ParserFactory; # dynamically load an available parser, o +r PurePerl if nothing else my $userid = "alex"; my $dataset = "hello2.xml"; my $file = "/usr/home/$userid/$dataset"; my $handler = SAXHandler->new(); my $parser = XML::SAX::ParserFactory->parser( Handler => $handler); $parser->parse_uri($file); package SAXHandler; sub new { my $type = shift; return bless {}, $type; } sub start_document { my ($self, $element) = @_; print "Starting document...\n"; } sub start_element { my ($self, $element) = @_; print "Starting element $element->{Name}\n"; } sub end_element { my ($self, $element) = @_; print "Ending element $element->{Name}\n"; } sub characters { my ($self, $characters) = @_; print "characters: $characters->{Data}\n"; } 1;

With hello2.xml looking like:

<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> ]> <greeting>Hello, world!</greeting>

This is outputting the error:

Name contains invalid start character: '&#x3C;'

Now, I know that that is UTF8 for the "<" character. But, XML needs to start with this character. I thought mabye it was the way I was calling parse_uri, so I tried opening a file handle and using parse_file. I tried pulling the whole file into a string and doing parse_string. I'm stumped.

Thanks,
Alex

P.S.- Are there any other pure-perl XML parsers out there? XML::SAX loads a billion modules and I just want something simple that doesn't require expat C compilation. Everything seems to rely on expat. I need portability.

Replies are listed 'Best First'.
•Re: PurePerl XML parsing
by merlyn (Sage) on Jun 16, 2002 at 19:54 UTC
      Anyone who decides to use this should know that it only implements a subset of XML. Things like external entities and possibly character references and alternate document encodings it is unlikely to cope well with. It's also based on a method of parsing XML (Shallow Parsing with Regexps) that hasn't been proven yet, and was implemented specifically for SOAP::Lite because SOAP only permits a subset of XML.
(jeffa) Re: PurePerl XML parsing
by jeffa (Bishop) on Jun 16, 2002 at 20:02 UTC
    I just installed the latest XML::SAX, ran your code, and it produced the following results:
    Starting document...
    Starting element greeting
    characters: Hello, world!
    Ending element greeting
    
    with Perl 5.6.0 - you might need to upgrade or use merlyn's suggestion.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: PurePerl XML parsing
by Matts (Deacon) on Jun 16, 2002 at 20:11 UTC
    I can't replicate that problem here. Though perhaps this is a 5.00503 regexp problem. Seems unlikely though as XML::SAX::PurePerl parses on a character-by-character basis.

    Are you using the latest version?

    PS: This sort of thing should be taken offline, probably to rt.cpan.org where we keep the XML::SAX bug database.

Re: PurePerl XML parsing
by blahblah (Friar) on Jun 16, 2002 at 18:32 UTC
    I almost forgot...
    $] is 5.00503. Which if I read the docs right shouldn't matter...?

    Alex
      Personally,I would use XML::Twig. Really you should decide exactly what functionality you require (DOM etc.) and then take a look at Kip Hampton's articles over at XML.com.
        I was using XML::Twig initially, but I need something PurePerl. Twig relies on XML::Parser, which relies in turn on expat.
        I've looked at every XML solution I could find, starting with Kip's good articles, but XML::SAX::PurePerl was the only PurePerl implementation I could find. I would be VERY interested in others.

        alex