silvertip257 has asked for the wisdom of the Perl Monks concerning the following question:

I have decided to use LibXML to parse XML, but have had difficulty getting any output from CData nodes. Below is a small test case that I tried to use to get something working. So far I've had no luck. Any help is welcome - but I want come to a solution using LibXML (not other parsers).

Thanks, Mike

<?xml version="1.0" encoding="UTF-8"?> <root> <child> <![CDATA[ Here's a bunch of fun text that I want to get a substring out of +. ]]> </child> </root>
#!/usr/bin/perl use warnings; use strict; use XML::LibXML; my $in = "testcd.xml"; my $parser = XML::LibXML->new; my $doc = $parser->parse_file( $in ) or die("Cannot parse input file") +; my $root = $doc->documentElement(); #my $val = ($root->getElementsByTagName("child"))[0]->nodeValue; my @ch = $root->getElementsByTagName("child"); my $val = $ch[0]->firstChild->nodeType; print $val; my $cdata = XML::LibXML::CDATASection->new(($root->getElementsByTagName("child"))[ +0]->nodeV$ $cdata = $cdata->data; print $cdata; exit 0;

Replies are listed 'Best First'.
Re: Parsing CData nodes with LibXML
by Your Mother (Archbishop) on Apr 07, 2010 at 04:42 UTC

    Don't forget about keep_blanks. It'll toss the empties if turned off.

    use XML::LibXML; my $parser = XML::LibXML->new; $parser->keep_blanks(0); my $doc = $parser->parse_string(<<'__XML__'); <?xml version="1.0" encoding="UTF-8"?> <root> <child> <![CDATA[ Here's a bunch of fun text that I want to get a substring out of. ]]> </child> </root> __XML__ print $doc->getDocumentElement->firstChild->textContent, $/;
Re: Parsing CData nodes with LibXML
by Anonymous Monk on Apr 07, 2010 at 03:32 UTC
    Don't forget about the examples and tests :)http://search.cpan.org/dist/XML-LibXML/MANIFEST
    #!/usr/bin/perl -- use strict; use warnings; use XML::LibXML; Main(@ARGV); exit(0); sub Main { my $doc = XML::LibXML->load_xml( string => <<'__XML__'); <?xml version="1.0" encoding="UTF-8"?> <root> <child> <![CDATA[ Here's a bunch of fun text that I want to get a substring out of +. ]]> </child> </root> __XML__ for my $nod ( $doc->findnodes('/root/child/text()') ) { print "$nod\n", $nod->string_value, "\n-----\n"; } } ## end sub Main __END__ XML::LibXML::Text=SCALAR(0xa12c14) ----- XML::LibXML::CDATASection=SCALAR(0xa983c4) Here's a bunch of fun text that I want to get a substring out of +. ----- XML::LibXML::Text=SCALAR(0x9d75d4) -----

      Isn't this funny...I'm responding to myself and solving my problem. I hope that this information will show up on search engines to save others the time that this parser behavior cost me.

      I have found out that the actual CData string that I want is stored in the second child node of my parent child element.

      Here's a modified snippet from what I found.

      my @ch = $root->getElementsByTagName("child"); my $type = $ch[0]->nodeType; my $val = $ch[0]->firstChild->nodeValue; my $type2 = ($ch[0]->childNodes)[1]->nodeType; my $val2 = ($ch[0]->childNodes)[1]->nodeValue; print $type."\n"; print $val."\n"; print $type2."\n"; print $val2."\n"; print "val Length = ".length($val)."\n";
        I thought that was obvious from my example :D and besides, everyone knows significant white-space is significant :)