in reply to Re: split an xml file into pieces
in thread split an xml file into pieces

I to agree

,

If you see the xml, there are some lines(between <ProductAttributes> and </ProductAttributes>) which we can't extract with <ProductAttributes>(.*)</ProductAttributes>,normally regex will search for a match line by line, here there are multiple lines for <ProductAttributes> tag.

here is my code,
use XML::Simple; binmode(STDOUT, ":utf8"); open (handle, "D:\\OlayNewProductCatalog_Interwoven_Products.xml"); $xml= new XML::Simple; $data = $xml->XMLin("D:\\OlayNewProductCatalog_Interwoven_Products.xml +"); $data->{Product}->[0]->{ProductAttributes}; open (handle, "D:\\OlayNewProductCatalog_Interwoven_Products.xml"); $xmlstandard="<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n"; $line1; $count=0; my @prdarry=(); my $prd; while(<handle>) { if($_ =~ /(\<Product\>)(.*)/) { print "<Product>"; print $2."\n"; $prd=$2; push(@prdarry,$2); if (/(\<SeoUrl\>)(.*)(\<\/SeoUrl\>)/) { $filename=$2; chomp($filename); $filename=~ s/\s+$//; $fname=~ s/[^[:ascii:]]//g; $filename =~ tr/ /_/; $filename =~ tr/\//_/; $fname=~ s/[^[:ascii:]]//g; $filename = "$filename".".xml"; open(FILE,"\>C:\\Users\\p.a.vamsi.krishna\\Desktop\\perl\\vams +i\\files\\$filename"); print "Created $2.xml File..... \n" if(!$?); print FILE "<Product>"; print FILE $prd."\n"."\t"; } if($_ =~ /(\<ProductAttributes\>)(.*)/) {print FILE $2;} if($_ =~ /(&lt;)(.*)/) {print FILE $2;} if($_ =~ /(.*)(\<\/ProductAttributes\>)/) {print FILE $1;} print FILE $_ if($_ !~ /(\<)/ ); if($_ !~ (/(\<)(.*)/)||(/(.*)(\>)/)) { #print FILE "No tags data...."; $notag=$_; print FILE $2; } #if($_ =~ /(\<Product\>)(.*)/) if($_ =~ /(.*)(\<\/Product\>)/) { print "\n\n\n"; print FILE $1; print FILE "</Product>"; } } } close(FH); close(handle);

Replies are listed 'Best First'.
Re^3: split an xml file into pieces
by keszler (Priest) on Dec 13, 2011 at 10:05 UTC

    The text inside the ProductAttributes tags resembles HTML, as if it were extracted from a complete HTML page. As is, it's not valid in an XML file - see CDATA for the proper way to handle it.

    If you can change how the XML is generated, make it use CDATA. If you must use a regex, see Modifiers for how to make the regex handle multiple lines.