Re: split an xml file into pieces

Replies are listed 'Best First'.
Re^2: split an xml file into pieces by Anonymous Monk on Dec 13, 2011 at 09:35 UTC
I might normally agree, except in this case the regex takes like two seconds to put together:.... They call this encouragment It encourages follow up questions of the same exact caliber	[reply]
Re^3: split an xml file into pieces by TJPride (Pilgrim) on Dec 13, 2011 at 10:04 UTC
When you really think about it, EVERY question people post on here is something they could solve by hiring the right programmer. This particular question only takes a few seconds to solve; many others, even well-thought-out posts, can take 20-30 minutes. Seems to me the latter are ripping us off more, in terms of potential money lost. I wouldn't help if it took me more than 5 minutes, but who can't spare a few seconds for even a stupid, lazy person?	[reply]
Re^4: split an xml file into pieces by Anonymous Monk on Dec 13, 2011 at 11:15 UTC
When you really think about it, EVERY question people post on here is something they could solve by hiring the right programmer....but who can't spare a few seconds for even a stupid, lazy person? After hundreds and hundreds of such encounters, I'm of the opinion keszlers approach is best (and cavacs too), spend those few seconds to ask for minimum effort , and wait to see if the OP plays along; its no fun to play by yourself. It appears the OP in this thread is willing to play along, and that is very nice; I can't wait to see how it ends. Related discussion To Answer, Or Not To Answer.... , So it's homework - so what? , Do my homework for me!	[reply]
Re^2: split an xml file into pieces by vamsi.padakandla (Initiate) on Dec 13, 2011 at 09:39 UTC
I to agree , If you see the xml, there are some lines(between <ProductAttributes> and </ProductAttributes>) which we can't extract with <ProductAttributes>(.)</ProductAttributes>,normally regex will search for a match line by line, here there are multiple lines for <ProductAttributes> tag. here is my code, use XML::Simple; binmode(STDOUT, ":utf8"); open (handle, "D:\\OlayNewProductCatalog_Interwoven_Products.xml"); $xml= new XML::Simple; $data = $xml->XMLin("D:\\OlayNewProductCatalog_Interwoven_Products.xml +"); $data->{Product}->[0]->{ProductAttributes}; open (handle, "D:\\OlayNewProductCatalog_Interwoven_Products.xml"); $xmlstandard="<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n"; $line1; $count=0; my @prdarry=(); my $prd; while(<handle>) { if($_ =~ /(\<Product\>)(.)/) { print "<Product>"; print $2."\n"; $prd=$2; push(@prdarry,$2); if (/(\<SeoUrl\>)(.)(\<\/SeoUrl\>)/) { $filename=$2; chomp($filename); $filename=~ s/\s+$//; $fname=~ s/[^[:ascii:]]//g; $filename =~ tr/ /_/; $filename =~ tr/\//_/; $fname=~ s/[^[:ascii:]]//g; $filename = "$filename".".xml"; open(FILE,"\>C:\\Users\\p.a.vamsi.krishna\\Desktop\\perl\\vams +i\\files\\$filename"); print "Created $2.xml File..... \n" if(!$?); print FILE "<Product>"; print FILE $prd."\n"."\t"; } if($_ =~ /(\<ProductAttributes\>)(.)/) {print FILE $2;} if($_ =~ /(<)(.)/) {print FILE $2;} if($_ =~ /(.)(\<\/ProductAttributes\>)/) {print FILE $1;} print FILE $_ if($_ !~ /(\<)/ ); if($_ !~ (/(\<)(.)/)\|\|(/(.)(\>)/)) { #print FILE "No tags data...."; $notag=$_; print FILE $2; } #if($_ =~ /(\<Product\>)(.)/) if($_ =~ /(.)(\<\/Product\>)/) { print "\n\n\n"; print FILE $1; print FILE "</Product>"; } } } close(FH); close(handle); [download]	[reply] [d/l]
Re^3: split an xml file into pieces by keszler (Priest) on Dec 13, 2011 at 10:05 UTC
The text inside the ProductAttributes tags resembles HTML, as if it were extracted from a complete HTML page. As is, it's not valid in an XML file - see CDATA for the proper way to handle it. If you can change how the XML is generated, make it use CDATA. If you must use a regex, see Modifiers for how to make the regex handle multiple lines.	[reply]
Re^2: split an xml file into pieces by moritz (Cardinal) on Dec 13, 2011 at 10:35 UTC
... and it will fail horribly as soon as nested `Product` tags appear in the XML. Perl 6 - second systems done right	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.