in reply to XML::Twig doctype and entity handling

The problem is that your XML is not well-formed. So the parser dies. That's what it is supposed to do. Setting the entity or the doctype in XML::Twig doesn't work, because the parser (expat) is at a lower level.

You should include the DTD declaration in your documents, staring them with <!DOCTYPE package PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.2 Package//EN">. You can do this on-the-fly BTW by opening the file through a pipe (open( my $package_fh, 'cat dtd_declaration real_file.xml'); $twig->parse( $package_fh);). In fact it doesn't even matter whether the DTD is available or not, as expat will gladly ignore it (as a result of course the entities will not be expanded).

Replies are listed 'Best First'.
Re^2: XML::Twig doctype and entity handling
by AZed (Monk) on Sep 08, 2008 at 18:40 UTC

    Ah, open-as-pipe does take care of the problem, thanks. I had been hoping that setting a doctype via doctype() would have prepended the assigned doctype declaration to the input, but if it doesn't, it doesn't.

    Unfortunately, it looks like the parser will try to handle tags outside of the twig roots anyway, meaning that even though the <metadata>...</metadata> clump that I want to work with is well-formed, the parser will still die before the twig I need is returned because the junk surrounding it is not. Amusingly, this technique does work to split out the HTML without the <metadata> elements, because twig_print_outside_roots will finish before the parser dies from mismatched tags as the text ends.

    Sysread it is, then.

    Thanks, again.