I am trying to do a XML::XPATH (out of CPAN) parse under windows. This works fine on Unix but not on Unix because Perl appears to be confusing the mechanism specifier which is not an arbitrary string but an introducer followed by the arbitrary part like file:///foo.xml. This is a URI. So what happens is I have this doctype line in my xml file:

<?xml version="1.0"?> <!DOCTYPE package PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.0.1 Package//EN" "oebpkg101.dtd" >

The last argument to DOCTYPE is the specifier for a dtd file that defines the grammar for the XML file. XML specifies that the context is relative to the location of the XML file, so the XML file and the dtd are in the same directory. The parse using XML::XPATH should work. Here's the (slightly simplified) code:

$currdir=cwd(); $opffile="${unencdir}\\${t}"; # passed into the sub $parser=XML::XPath::XMLParser->new(filename => ${opffile} ) +; eval { $doc = $parser->parse } ; if ( (! defined $doc) || $@) { print stderr "Parse of ${opffile} failed $@\n"; print stderr "Can't obtain needed information from opf...a +borting\n"; exit 17; }

The output (on stderr):

Parse of g:\docset\asha\mydoc\books\66007\noDRM\66007.opf failed

501 Protocol scheme 'g' is not supported g:/oebpkg101.dtd Handler couldn't resolve external entity at line 2, column 92, byte 115

error in processing external entity reference at line 2, column 92, byte 115:

<?xml version="1.0"?>

<!DOCTYPE package PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.0.1 Package//EN" "oebpkg101.dtd" >

===========================================================================================^

<package unique-identifier="uid" xmlns="http://openebook.org/namespaces/oeb-package/1.0/" >

<metadata> at C:/Perl/lib/XML/Parser.pm line 187

Can't obtain needed information from opf...aborting

I realize there's a fair amount of XML in here. But I believe the XML is valid. It works fine on Unix. It appears that the XPATH module is confused about the g: windows device name and the mechanism specifier. I sure hope I dont'have to change all those "oebpkg101.dtd" instances to file:///oebkg101.dtd or something similar; there's thousands of them. I've seen similar posts elsewhere on the net but never got a satisfactory answer.

TIA.

Avi Shapiro


In reply to Perl XPATH parse difficulty by shapavi

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.