in reply to XML::Twig::flush() and html/xml entities

I'm going to seriously simplify your question. Here's the code:

#!/usr/bin/perl use strict; use XML::Twig; binmode(STDOUT, ":utf8"); my $t= XML::Twig->new(); $t->set_keep_encoding; $t->parse(do { local $/; <DATA>}); $t->flush; exit 0; __END__ <?xml version="1.0" encoding="UTF-8"?> <harvest> <subject>Computation &amp; Language</subject> <subject>Computer Science - Computation &amp; Language</subject> </harvest>
And here's the output:
<?xml version="1.0" encoding="UTF-8"?> <harvest><subject>Computation & Language</subject><subject>Computer Sc +ience - Computation & Language</subject></harvest>
And you want to change the &'s to &amp;'s. The solution seems to be to remove the call to set_keep_encoding. When I remove that, the output becomes what you want. Whether that's a bug in the keep-encoding or the flush or whatever, I don't know. Hopefully mirod can help here ;-)

Update: It appears I was a few minutes behind mirod on this. Oops. :-)

Replies are listed 'Best First'.
Re^2: XML::Twig::flush() and html/xml entities
by mirod (Canon) on Oct 05, 2006 at 19:02 UTC

    Indeed, there is a bug in set_keep_encoding.. If you put the option in the new, then the code runs fine. I have to look at it, a naive attempt at fixing it generates a boatload of errors in the tests.

Re^2: XML::Twig::flush() and html/xml entities
by mandarin (Hermit) on Oct 06, 2006 at 08:38 UTC
    I can't remember exactly, when and why I invented the call to set_keep_encoding but I think it was due to problems with the encoding of the output.
    Maybe those where fixed by inventing the binmode(STDOUT,":utf8") line.
    I'm as quite new to Perl as to xml as to utf-8, so development went somewhat on a trial and error basis ;-)
    Anyway, it works without set_keep_encoding. Fine :-)
    Thanks a lot to anyone helping, esp. you and mirod
Re^2: XML::Twig::flush() and html/xml entities
by mirod (Canon) on Oct 06, 2006 at 13:23 UTC