Hello,

I have written a pretty straightforward XML::Twig-based script, but I am having a problem with the outputted XML. The XML comes in with the following declarations:

<!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded Arch +ival Description (EAD) Version 2002)//EN" "../ead_dtd/ead.dtd" [ <!ENTITY scrc_name SYSTEM "scrc_name.xml"> <!ENTITY su_address SYSTEM "su_address.xml"> <!ENTITY su_name SYSTEM "su_name.xml"> <!ENTITY subjindex SYSTEM "subjindex.xml"> <!ENTITY summitref SYSTEM "summit_ref.xml"> ]>
But, the result looks like:

<!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded Arch +ival Description (EAD) Version 2002)//EN" "../ead_dtd/ead.dtd" [ <!ENTITY scrc_name "Special Collections Research Center"> <!ENTITY su_address '<address> <addressline>123 Elm St.<lb/></addressline> <addressline>Columbus, Oh 43021<lb/></addressline> </address> '> <!ENTITY su_name ... etc.

I have a tried every combination of options I can come up with for XML::Twig and XML::Parser that I can come up with, but haven't really gotten anywhere. The closest I've come is to switch to twig roots, with a combination of twig_print_outside_roots and keep_encoding, but that causes the Parser to bail out when it encounters an entity reference. The declarations do print out ~almost~ right, though--they are missing the square brackets.

I am running ActivePerl 5.8.8 on Windows with XML::Twig 1.26 and XML::Parser 2.34-r1.

Here is the code, with my most recently failed attempts at specifying options:

### my $twig_handlers = {'ead/archdesc/did/unitdate' => \&cont_break}; my $twig = new XML::Twig(TwigHandlers => $twig_handlers, expand_extern +al_ents => 0, NoExpand => 1, ExpandExternalEnts => 0, ParseParamEnt = +> 0); $twig->parsefile($xmlfile, NoExpand => 1, ExpandExternalEnts => 0, Par +seParamEnt => 0, expandEntityReferences => 0, SkipExternalDTD => 1); select XMLOUTPUT; $twig->print; # re-output the XML, with the normalized dates close XMLOUTPUT; ###

I really appreciate any help that anyone could lend.

UPDATE: I just tried this script with the development version of XML::Twig, 3.3.0, and it partially fixes the problem. The Entity Declarations are no longer being expanded, but the entity references still are.It seems like this is an XML::Parser issue, but I can't seem to come up with the right combination of options...

UPDATE 2: I was able to get everything working perfectly on linux by upgrading to XML::Twig and also to the Expat 2.0.1 libraries. Unfortunately, I am having a really hard time figuring out how to get the Windows Expat libraries working with XML::Parser. When I just swap in the new DLL's, I get an error about not finding the boot_XML__Parser__Expat symbol in Expat.dll.

UPDATE 3: mirod fixed this issue in the current development (3.30) version of XML::Twig. Thanks!


In reply to ignore XML entity declarations in XML::Twig? by cazzerson

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.