"CDATA Parsing and XML"

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, How do i extract the CDATA section in an XML document using XML::DOM::Parser.In the enclosed XML i just need to extract the CDATA part and print it .Any help or pointers will be really appreciated Thanks Natarajan Sample XML -------

<?xml version="1.0" encoding="UTF-8" ?> 
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envel
+ope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://
+www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
 <ns1:e3 xmlns:ns1="urn:foo">
 <![CDATA[ 
<?xml
version="1.0" encoding="UTF-8"?> <CdmXML version="1.1"> <header> <sche
+maType>
CiscoDeviceManagement</schemaType> </header> <body> <deviceList type="
+saveall" method="console">
<device> <deviceInfo deviceId="10519" deviceName="fplab-9216a" deviceI
+P="172.24.111.5"
terminalServerName="fplab-2511a" terminalServerIP="172.25.24.245" term
+inalServerPort="2001" /> <job
type="saveall" method="console" /> <result status="SAVE ALL STARTED" m
+essage="" />
<additionalAttribute labTypeName="Building 13" solName="srid_tes t0801
+" userId="sridkris"
genlab_cltSolId="1373" need_config="true" cityName="San Jose (US)" lls
+Password="$uNeVa1e"
solutionId="8254" llsIP="None" intTypeNo="900 0" configSupported="true
+" llsExtDevTypeName="UNIX"
scheduleId="40179" llsUserNam e="labview" extTypeName="9XXX" need_firm
+ware="false" cltSolId="8414"
llsCorporat eIP="171.71.176.94" /> </device> </deviceList> </body> </C
+dmXML> 

  ]]> 
  </ns1:e3>
  </soapenv:Body>
  </soapenv:Envelope>
[download]

Comment on "CDATA Parsing and XML" Download Code

Replies are listed 'Best First'.
Re: "CDATA Parsing and XML" by pg (Canon) on Mar 30, 2003 at 09:34 UTC
If you just want the CDATA portion, there is really no point to use DOM. Instead, use XML::Parser or XML::Parser::Expat, and do something like this: `use XML::Parser::Expat; use strict; my $xml_string; open(FOO, "test.xml") or die "failed to open"; local $/ = undef; $xml_string = <FOO>; close(FOO); my $parser = new XML::Parser::Expat; my $first; $parser->setHandlers(CdataStart => \&start, CdataEnd => \&end ); $parser->parse($xml_string); sub start { $first = $_[0]->current_byte; } sub end { print substr($xml_string, $first, $_[0]->current_byte - $first + 1 +); }` [download]	[reply] [d/l]
Re: "CDATA Parsing and XML" by benn (Vicar) on Mar 30, 2003 at 18:13 UTC
Loathe though I am to start getting labelled a crusty old-timer as this is my 2nd 'avoid-de-overhead' post in as many weeks, there's always the regex / split / traditional text-processing approach to consider too... `use File::Slurp; #my favourite module :) my $cdata = read_file("foo.xml"); $cdata =~ s/^.?(<!\[CDATA\[.?\]\]>).*$/$1/;` [download] Cheers, Ben.	[reply] [d/l]
Re: Re: "CDATA Parsing and XML" by fokat (Deacon) on Mar 31, 2003 at 02:06 UTC
First of all, a ++ is in order because you shown another approach to the problem. I guess your example might not work as expected, because the regexp is greedy, so if the XML contains more than one `CDATA` sections, only the first will be seen. I made slight adjustments as shown below... `#!/usr/bin/perl use strict; use warnings; my $cdata = join('', <DATA>); while ($cdata =~ s/<!\[CDATA\[(.*?)\]\]>/$1/ms) { print "cdata = $1\n"; } __DATA__ <?xml version="1.0" encoding="UTF-8" ?> <![CDATA[This is the first cdata]]> <![CDATA[This is the second cdata]]> <some/>` [download] So that it outputs... `cdata = This is the first cdata cdata = This is the second cdata` [download] Also, probably hating dot star is in order, but I can't think about a better regexp right now :) . Best regards -lem, but some call me fokat	[reply] [d/l] [select]