Parsing CData nodes with LibXML

silvertip257 has asked for the wisdom of the Perl Monks concerning the following question:

I have decided to use LibXML to parse XML, but have had difficulty getting any output from CData nodes. Below is a small test case that I tried to use to get something working. So far I've had no luck. Any help is welcome - but I want come to a solution using LibXML (not other parsers).

Thanks, Mike

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <child>
    <![CDATA[
      Here's a bunch of fun text that I want to get a substring out of
+.
    ]]>
  </child>
</root>
[download]

#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;

my $in = "testcd.xml";

my $parser = XML::LibXML->new;
my $doc = $parser->parse_file( $in ) or die("Cannot parse input file")
+;

my $root = $doc->documentElement();

#my $val = ($root->getElementsByTagName("child"))[0]->nodeValue;
my @ch = $root->getElementsByTagName("child");

my $val = $ch[0]->firstChild->nodeType;
print $val;

my $cdata =
XML::LibXML::CDATASection->new(($root->getElementsByTagName("child"))[
+0]->nodeV$

$cdata = $cdata->data;

print $cdata;

exit 0;
[download]

Comment on Parsing CData nodes with LibXML Select or Download Code

Replies are listed 'Best First'.
Re: Parsing CData nodes with LibXML by Your Mother (Archbishop) on Apr 07, 2010 at 04:42 UTC
Don't forget about `keep_blanks`. It'll toss the empties if turned off. `use XML::LibXML; my $parser = XML::LibXML->new; $parser->keep_blanks(0); my $doc = $parser->parse_string(<<'__XML__'); <?xml version="1.0" encoding="UTF-8"?> <root> <child> <![CDATA[ Here's a bunch of fun text that I want to get a substring out of. ]]> </child> </root> __XML__ print $doc->getDocumentElement->firstChild->textContent, $/;` [download]	[reply] [d/l] [select]
Re: Parsing CData nodes with LibXML by Anonymous Monk on Apr 07, 2010 at 03:32 UTC
Don't forget about the examples and tests :)http://search.cpan.org/dist/XML-LibXML/MANIFEST #!/usr/bin/perl -- use strict; use warnings; use XML::LibXML; Main(@ARGV); exit(0); sub Main { my $doc = XML::LibXML->load_xml( string => <<'__XML__'); <?xml version="1.0" encoding="UTF-8"?> <root> <child> <![CDATA[ Here's a bunch of fun text that I want to get a substring out of +. ]]> </child> </root> __XML__ for my $nod ( $doc->findnodes('/root/child/text()') ) { print "$nod\n", $nod->string_value, "\n-----\n"; } } ## end sub Main __END__ XML::LibXML::Text=SCALAR(0xa12c14) ----- XML::LibXML::CDATASection=SCALAR(0xa983c4) Here's a bunch of fun text that I want to get a substring out of +. ----- XML::LibXML::Text=SCALAR(0x9d75d4) ----- [download]	[reply] [d/l]
Re^2: Parsing CData nodes with LibXML by silvertip257 (Initiate) on Apr 07, 2010 at 03:43 UTC
Isn't this funny...I'm responding to myself and solving my problem. I hope that this information will show up on search engines to save others the time that this parser behavior cost me. I have found out that the actual CData string that I want is stored in the second child node of my parent child element. Here's a modified snippet from what I found. `my @ch = $root->getElementsByTagName("child"); my $type = $ch[0]->nodeType; my $val = $ch[0]->firstChild->nodeValue; my $type2 = ($ch[0]->childNodes)[1]->nodeType; my $val2 = ($ch[0]->childNodes)[1]->nodeValue; print $type."\n"; print $val."\n"; print $type2."\n"; print $val2."\n"; print "val Length = ".length($val)."\n";` [download]	[reply] [d/l]
Re^3: Parsing CData nodes with LibXML by Anonymous Monk on Apr 07, 2010 at 03:52 UTC
I thought that was obvious from my example :D and besides, everyone knows significant white-space is significant :)	[reply]