shapavi has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to do a XML::XPATH (out of CPAN) parse under windows. This works fine on Unix but not on Unix because Perl appears to be confusing the mechanism specifier which is not an arbitrary string but an introducer followed by the arbitrary part like file:///foo.xml. This is a URI. So what happens is I have this doctype line in my xml file:

<?xml version="1.0"?> <!DOCTYPE package PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.0.1 Package//EN" "oebpkg101.dtd" >

The last argument to DOCTYPE is the specifier for a dtd file that defines the grammar for the XML file. XML specifies that the context is relative to the location of the XML file, so the XML file and the dtd are in the same directory. The parse using XML::XPATH should work. Here's the (slightly simplified) code:

$currdir=cwd(); $opffile="${unencdir}\\${t}"; # passed into the sub $parser=XML::XPath::XMLParser->new(filename => ${opffile} ) +; eval { $doc = $parser->parse } ; if ( (! defined $doc) || $@) { print stderr "Parse of ${opffile} failed $@\n"; print stderr "Can't obtain needed information from opf...a +borting\n"; exit 17; }

The output (on stderr):

Parse of g:\docset\asha\mydoc\books\66007\noDRM\66007.opf failed

501 Protocol scheme 'g' is not supported g:/oebpkg101.dtd Handler couldn't resolve external entity at line 2, column 92, byte 115

error in processing external entity reference at line 2, column 92, byte 115:

<?xml version="1.0"?>

<!DOCTYPE package PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.0.1 Package//EN" "oebpkg101.dtd" >

===========================================================================================^

<package unique-identifier="uid" xmlns="http://openebook.org/namespaces/oeb-package/1.0/" >

<metadata> at C:/Perl/lib/XML/Parser.pm line 187

Can't obtain needed information from opf...aborting

I realize there's a fair amount of XML in here. But I believe the XML is valid. It works fine on Unix. It appears that the XPATH module is confused about the g: windows device name and the mechanism specifier. I sure hope I dont'have to change all those "oebpkg101.dtd" instances to file:///oebkg101.dtd or something similar; there's thousands of them. I've seen similar posts elsewhere on the net but never got a satisfactory answer.

TIA.

Avi Shapiro

Replies are listed 'Best First'.
Re: Perl XPATH parse difficulty
by ikegami (Patriarch) on Jun 18, 2010 at 17:37 UTC

    I am trying to do a XML::XPATH (out of CPAN) parse under windows

    I presume you mean XML::XPath.

    It appears that the XPATH module is confused about the g: windows device name and the mechanism specifier.

    More specifically, the module assumes paths and URIs are interchangeable.

    I've seen similar posts elsewhere on the net but never got a satisfactory answer.

    There's a bug in the module that needs fixing. ( Actually, it would be in XML::Parser, XML::Parser::Expat or maybe even the C library expat. )

    I bet XML::LibXML doesn't have that bug.

      Thanks for the help. I'll take a look at XML::LibXML to see if it does the job. Avi