in reply to split an xml file into pieces

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re^2: split an xml file into pieces
by Anonymous Monk on Dec 13, 2011 at 09:35 UTC

    I might normally agree, except in this case the regex takes like two seconds to put together:....

    They call this encouragment

    It encourages follow up questions of the same exact caliber

      When you really think about it, EVERY question people post on here is something they could solve by hiring the right programmer. This particular question only takes a few seconds to solve; many others, even well-thought-out posts, can take 20-30 minutes. Seems to me the latter are ripping us off more, in terms of potential money lost.

      I wouldn't help if it took me more than 5 minutes, but who can't spare a few seconds for even a stupid, lazy person?

        When you really think about it, EVERY question people post on here is something they could solve by hiring the right programmer....but who can't spare a few seconds for even a stupid, lazy person?

        After hundreds and hundreds of such encounters, I'm of the opinion keszlers approach is best (and cavacs too), spend those few seconds to ask for minimum effort , and wait to see if the OP plays along; its no fun to play by yourself.

        It appears the OP in this thread is willing to play along, and that is very nice; I can't wait to see how it ends.

        Related discussion To Answer, Or Not To Answer.... , So it's homework - so what? , Do my homework for me!

Re^2: split an xml file into pieces
by vamsi.padakandla (Initiate) on Dec 13, 2011 at 09:39 UTC

    I to agree

    ,

    If you see the xml, there are some lines(between <ProductAttributes> and </ProductAttributes>) which we can't extract with <ProductAttributes>(.*)</ProductAttributes>,normally regex will search for a match line by line, here there are multiple lines for <ProductAttributes> tag.

    here is my code,
    use XML::Simple; binmode(STDOUT, ":utf8"); open (handle, "D:\\OlayNewProductCatalog_Interwoven_Products.xml"); $xml= new XML::Simple; $data = $xml->XMLin("D:\\OlayNewProductCatalog_Interwoven_Products.xml +"); $data->{Product}->[0]->{ProductAttributes}; open (handle, "D:\\OlayNewProductCatalog_Interwoven_Products.xml"); $xmlstandard="<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n"; $line1; $count=0; my @prdarry=(); my $prd; while(<handle>) { if($_ =~ /(\<Product\>)(.*)/) { print "<Product>"; print $2."\n"; $prd=$2; push(@prdarry,$2); if (/(\<SeoUrl\>)(.*)(\<\/SeoUrl\>)/) { $filename=$2; chomp($filename); $filename=~ s/\s+$//; $fname=~ s/[^[:ascii:]]//g; $filename =~ tr/ /_/; $filename =~ tr/\//_/; $fname=~ s/[^[:ascii:]]//g; $filename = "$filename".".xml"; open(FILE,"\>C:\\Users\\p.a.vamsi.krishna\\Desktop\\perl\\vams +i\\files\\$filename"); print "Created $2.xml File..... \n" if(!$?); print FILE "<Product>"; print FILE $prd."\n"."\t"; } if($_ =~ /(\<ProductAttributes\>)(.*)/) {print FILE $2;} if($_ =~ /(&lt;)(.*)/) {print FILE $2;} if($_ =~ /(.*)(\<\/ProductAttributes\>)/) {print FILE $1;} print FILE $_ if($_ !~ /(\<)/ ); if($_ !~ (/(\<)(.*)/)||(/(.*)(\>)/)) { #print FILE "No tags data...."; $notag=$_; print FILE $2; } #if($_ =~ /(\<Product\>)(.*)/) if($_ =~ /(.*)(\<\/Product\>)/) { print "\n\n\n"; print FILE $1; print FILE "</Product>"; } } } close(FH); close(handle);

      The text inside the ProductAttributes tags resembles HTML, as if it were extracted from a complete HTML page. As is, it's not valid in an XML file - see CDATA for the proper way to handle it.

      If you can change how the XML is generated, make it use CDATA. If you must use a regex, see Modifiers for how to make the regex handle multiple lines.

Re^2: split an xml file into pieces
by moritz (Cardinal) on Dec 13, 2011 at 10:35 UTC
    A reply falls below the community's threshold of quality. You may see it by logging in.