joeperl has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
I have a set of XML files where one will be referenced in another. For example A->B->C->D etc....(Please refere to http://www.perlmonks.com/?node_id=817466 for my earlier query)

The intent of the script is to take information from all of these XMLs and then work on that data.

The script worked fine in perl version 5.8.8 but was giving a peculiar error when run in perl-5.8.9

The error is "xml declaration not at start of external entity at line 37, column 11, byte 1307 at /home/tools/perl-5.12.1/Linux-64bit/lib/site_perl/5.12.1/x86_64-linux/XML /Parser.pm line 187"

The error seems to be coming because of the first line in each of the XMLs, which is <?xml version="1.0" encoding="ISO-8859-1"?>, which I learned provides information regarding the type of encoding used.

So where am I going wrong ??

Thanks for your help

Regards,
joe

sample top level xml file:


<?xml version="1.0" encoding="ISO-8859-1"?> <top xmlns:xi="http://www.w3.org/2001/XInclude"> <module name="TOP"> <moduleref name="module_a"/> <moduleref name="module_b"/> <moduleref name="module_c"/> <data>data</data> <data>data</data> </module> <xi:include href="$WORK_ROOT/MOD_A/module_a.xml"/> <xi:include href="$WORK_ROOT/MOD_B/module_b.xml"/> <xi:include href="$WORK_ROOT/MOD_C/module_c.xml"/> </top>


sample module xml


<?xml version="1.0" encoding="ISO-8859-1"?> <record xmlns:xi="http://www.w3.org/2001/XInclude"> <module type="record" name="module_a"> <data>data</data> <data>data</data> </module> </record>


Now for the code i use to process these xml files,

This code expands the top level xml file,


use XML::DOM; use XML::SAX; use XML::SAX::Writer; use FindBin; use lib "$FindBin::Bin"; use PATHREF; use IO::File; use File::Find; use File::Copy; use Cwd; use strict; my $input_file = "top.xml"; my $output_file = "output.xml"; my $output = new IO::File ">$output_file"; print "Expanding include tags...\n"; my $parser = XML::SAX::ParserFactory->parser( Handler =>XML::Filter::XInclude->new( Handler => XML::SAX::Writer->new(Output=>$output) ) ); $parser->parse_uri($input_file); close($output);


This code is used to parse the expanded top.xml where i get the error,


$parser = new XML::DOM::Parser; my $doc = $parser->parsefile("$output_file");


Note : 1.the module PATHREF is actually the module XML::Filter::XInclude modified to process ENV variable $WORK_ROOT
2.also i ran the script both in 5.8.9 and 5.12.1 but still the same error



PS: This info might help,
i tried removing the content <?xml version="1.0" encoding="ISO-8859-1"?> from the expanded top.xml and there were no errors!!! so does that mean i need not give this info in each of the XMLs or do i have to remove this after expansion everytime to avoid the error ?

Replies are listed 'Best First'.
Re: XML parser error
by ikegami (Patriarch) on Jun 18, 2010 at 20:09 UTC

    XML::Filter::XInclude appears to be a little too naïve. If the included XML use the same encoding as the including XML, I'd try removing the <?xml?> line from the included XML. If that doesn't work, you'll have to get the bug fixed, switch to UTF-8, or find an alternative solution. Either way, you should report the bug.

    By the way, you said you're using Perl 5.8.9, but the message implies you're using 5.12.1??

      i tried switching to UTF-8 ,still the problem persists

        Switching to UTF-8 was to be done after and in addition to removing the <?xml?> directive from the included XML, if needed.

        You'll definitely get that warning as long as the <?xml?> line is in the included XML due to the bug. I was trying to foresee and fix problems you might have after removing the directive. You might have problems parsing an included file without its <?xml?> directive if it's not encoded using UTF-8. You might have problems including a file into a document that has a different encoding than the included file (but probably not).

        It's probably a good idea to use UTF-8 everywhere anyway, if you have a choice.

Re: XML parser error
by almut (Canon) on Jun 18, 2010 at 20:03 UTC
    So where am I going wrong ??

    Not showing us the code plus some XML sample input that could potentially allow us to reproduce the problem... :)

    There's nothing wrong with <?xml version="1.0" encoding="ISO-8859-1"?> as is, so there must be something else (the context in which it occurs, ...?), or you've found a bug.

    P.S.:

    ...when run in perl-5.8.9

    The path /home/tools/perl-5.12.1/... from the error message suggests you're using 5.12.1, not 5.8.9.