vaevictus has asked for the wisdom of the Perl Monks concerning the following question:
I've got perl v5.6.1 built for i686-linux, it's on slackware. I'm trying to install XML::DOM, which depends on XML::Parser::PerlSAX, which is in lib-xml.
I'm getting an error with test 11 of stream.h (in PerlSAX).
I'm thinking this may be an encoding thing i just don't understand, but here's what i think it is choking on, and my question would be why it's written this way, or why does my setup not do it properly.
The test:
print (($string eq $expected) ? "ok 11\n" : "not ok 11\n");The $string
$string = $parser->parse(Source => { Encoding => 'ISO-8859-1', String => <<"EOF;" } ); <!DOCTYPE foo [ <!NOTATION bar PUBLIC "qrs"> <!ENTITY zinger PUBLIC "xyz" "abc" NDATA bar> <!ENTITY fran "fran-def"> <!ENTITY zoe "zoe.ent"> ]> <foo> First line in foo <boom>Fran is &fran; and Zoe is &zoe;</boom> <bar id="jack" stomp="jill"> <?line-noise *&*&^&<< ?> 1st line in bar <blah> 2nd line in bar </blah> 3rd line in bar <!-- Isn't this a doozy --> </bar> <zap ref="zing" /> This, '\240', would be a bad character in UTF-8. </foo> EOF;
produces:
<?xml version="1.0" encoding="UTF-8"?> <foo> First line in foo <boom>Fran is fran-def and Zoe is zoe.ent</boom> <bar id="jack" stomp="jill"> <?line-noise *&*&^&<< ?> 1st line in bar <blah> 2nd line in bar </blah> 3rd line in bar <!-- Isn't this a doozy --> </bar> <zap fubar="1" ref="zing"></zap> This, 'Â ', would be a bad character in UTF-8. </foo>
The $expected
$expected = <<"EOF;"; <?xml version="1.0" encoding="UTF-8"?> <foo> First line in foo <boom>Fran is fran-def and Zoe is zoe.ent</boom> <bar id="jack" stomp="jill"> <?line-noise *&*&^&<< ?> 1st line in bar <blah> 2nd line in bar </blah> 3rd line in bar <!-- Isn't this a doozy --> </bar> <zap fubar="1" ref="zing"></zap> This, '\302\240', would be a bad character in UTF-8. </foo> EOF;
produces:
<?xml version="1.0" encoding="UTF-8"?> <foo> First line in foo <boom>Fran is fran-def and Zoe is zoe.ent</boom> <bar id="jack" stomp="jill"> <?line-noise *&*&^&<< ?> 1st line in bar <blah> 2nd line in bar </blah> 3rd line in bar <!-- Isn't this a doozy --> </bar> <zap fubar="1" ref="zing"></zap> This, 'ÃÂ ', would be a bad character in UTF-8. </foo>
the extra character in the 2nd This would be a bad character line is coded in... I'm confused to why it's there.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: XML::Parser::PerlSax ... not passing a simple test
by mirod (Canon) on Nov 07, 2001 at 21:09 UTC | |
by vaevictus (Pilgrim) on Nov 07, 2001 at 21:13 UTC | |
|
Re: XML::Parser::PerlSax ... not passing a simple test
by mitd (Curate) on Nov 09, 2001 at 06:26 UTC | |
|
XML::Parser::PerlSax ... still not passing a simple test
by vaevictus (Pilgrim) on Jul 19, 2004 at 17:28 UTC |