Re: "CDATA Parsing and XML"

Loathe though I am to start getting labelled a crusty old-timer as this is my 2nd 'avoid-de-overhead' post in as many weeks, there's always the regex / split / traditional text-processing approach to consider too...

use File::Slurp; #my favourite module :)
my $cdata = read_file("foo.xml");
$cdata =~ s/^.*?(<!\[CDATA\[.*?\]\]>).*$/$1/;
[download]

Cheers, Ben.

Comment on Re: "CDATA Parsing and XML" Download Code

Replies are listed 'Best First'.
Re: Re: "CDATA Parsing and XML" by fokat (Deacon) on Mar 31, 2003 at 02:06 UTC
First of all, a ++ is in order because you shown another approach to the problem. I guess your example might not work as expected, because the regexp is greedy, so if the XML contains more than one `CDATA` sections, only the first will be seen. I made slight adjustments as shown below... `#!/usr/bin/perl use strict; use warnings; my $cdata = join('', <DATA>); while ($cdata =~ s/<!\[CDATA\[(.*?)\]\]>/$1/ms) { print "cdata = $1\n"; } __DATA__ <?xml version="1.0" encoding="UTF-8" ?> <![CDATA[This is the first cdata]]> <![CDATA[This is the second cdata]]> <some/>` [download] So that it outputs... `cdata = This is the first cdata cdata = This is the second cdata` [download] Also, probably hating dot star is in order, but I can't think about a better regexp right now :) . Best regards -lem, but some call me fokat	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: Re: "CDATA Parsing and XML"
by fokat (Deacon) on Mar 31, 2003 at 02:06 UTC

First of all, a ++ is in order because you shown another approach to the problem.

I guess your example might not work as expected, because the regexp is greedy, so if the XML contains more than one CDATA sections, only the first will be seen. I made slight adjustments as shown below...

#!/usr/bin/perl

use strict;
use warnings;

my $cdata = join('', <DATA>);

while ($cdata =~ s/<!\[CDATA\[(.*?)\]\]>/$1/ms)
{
    print "cdata = $1\n";
}

__DATA__
<?xml version="1.0" encoding="UTF-8" ?> 
<![CDATA[This is the first cdata]]>
<![CDATA[This is the second cdata]]>
<some/>
[download]

So that it outputs...

cdata = This is the first cdata
cdata = This is the second cdata
[download]

Also, probably hating dot star is in order, but I can't think about a better regexp right now :) .

Best regards

-lem, but some call me fokat

[reply]
[d/l]
[select]