Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, How do i extract the CDATA section in an XML document using XML::DOM::Parser.In the enclosed XML i just need to extract the CDATA part and print it .Any help or pointers will be really appreciated Thanks Natarajan Sample XML -------
<?xml version="1.0" encoding="UTF-8" ?> <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envel +ope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http:// +www.w3.org/2001/XMLSchema-instance"> <soapenv:Body> <ns1:e3 xmlns:ns1="urn:foo"> <![CDATA[ <?xml version="1.0" encoding="UTF-8"?> <CdmXML version="1.1"> <header> <sche +maType> CiscoDeviceManagement</schemaType> </header> <body> <deviceList type=" +saveall" method="console"> <device> <deviceInfo deviceId="10519" deviceName="fplab-9216a" deviceI +P="172.24.111.5" terminalServerName="fplab-2511a" terminalServerIP="172.25.24.245" term +inalServerPort="2001" /> <job type="saveall" method="console" /> <result status="SAVE ALL STARTED" m +essage="" /> <additionalAttribute labTypeName="Building 13" solName="srid_tes t0801 +" userId="sridkris" genlab_cltSolId="1373" need_config="true" cityName="San Jose (US)" lls +Password="$uNeVa1e" solutionId="8254" llsIP="None" intTypeNo="900 0" configSupported="true +" llsExtDevTypeName="UNIX" scheduleId="40179" llsUserNam e="labview" extTypeName="9XXX" need_firm +ware="false" cltSolId="8414" llsCorporat eIP="171.71.176.94" /> </device> </deviceList> </body> </C +dmXML> ]]> </ns1:e3> </soapenv:Body> </soapenv:Envelope>

Replies are listed 'Best First'.
Re: "CDATA Parsing and XML"
by pg (Canon) on Mar 30, 2003 at 09:34 UTC
    If you just want the CDATA portion, there is really no point to use DOM. Instead, use XML::Parser or XML::Parser::Expat, and do something like this:
    use XML::Parser::Expat; use strict; my $xml_string; open(FOO, "test.xml") or die "failed to open"; local $/ = undef; $xml_string = <FOO>; close(FOO); my $parser = new XML::Parser::Expat; my $first; $parser->setHandlers(CdataStart => \&start, CdataEnd => \&end ); $parser->parse($xml_string); sub start { $first = $_[0]->current_byte; } sub end { print substr($xml_string, $first, $_[0]->current_byte - $first + 1 +); }
Re: "CDATA Parsing and XML"
by benn (Vicar) on Mar 30, 2003 at 18:13 UTC
    Loathe though I am to start getting labelled a crusty old-timer as this is my 2nd 'avoid-de-overhead' post in as many weeks, there's always the regex / split / traditional text-processing approach to consider too...
    use File::Slurp; #my favourite module :) my $cdata = read_file("foo.xml"); $cdata =~ s/^.*?(<!\[CDATA\[.*?\]\]>).*$/$1/;
    Cheers, Ben.

      First of all, a ++ is in order because you shown another approach to the problem.

      I guess your example might not work as expected, because the regexp is greedy, so if the XML contains more than one CDATA sections, only the first will be seen. I made slight adjustments as shown below...

      #!/usr/bin/perl use strict; use warnings; my $cdata = join('', <DATA>); while ($cdata =~ s/<!\[CDATA\[(.*?)\]\]>/$1/ms) { print "cdata = $1\n"; } __DATA__ <?xml version="1.0" encoding="UTF-8" ?> <![CDATA[This is the first cdata]]> <![CDATA[This is the second cdata]]> <some/>

      So that it outputs...

      cdata = This is the first cdata cdata = This is the second cdata

      Also, probably hating dot star is in order, but I can't think about a better regexp right now :) .

      Best regards

      -lem, but some call me fokat