in reply to Re^2: Multiple XML files from Directory to One XML file using perl.
in thread Multiple XML files from Directory to One XML file using perl.
As for what you really want, which is one <shiporder ...> element containing all the content of all the files (that is, combining the "shipto" elements from all the input files into one "shiporder"), that's a different plan from what I was suggesting, and it would be best to use a parser for that.
In fact, it seems like the OP code is really pretty close to what you want. Here's my version, with Digest::MD5 thrown in to eliminate duplicate "shipto" content:
That seems to work on a set of files such as the following, leaving out "j4.xml" because it's identical to "j2.xml":#!/usr/lib/perl use strict; use warnings; use Carp; use File::Find; use File::Spec::Functions qw( canonpath ); use XML::LibXML::Reader; use Digest::MD5 'md5'; if ( @ARGV == 0 ) { push @ARGV, "C:/file/dir"; warn "Using default path $ARGV[0]\n Usage: $0 path ...\n"; } # open an output file whose name won't be found by File::Find open( my $allxml, '>', "all_shiporders.xml.combined" ) or die "can't open output xml file for writing: $!\n"; print $allxml '<?xml version="1.0" encoding="UTF-8"?>', "\n<shiporder xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instanc +e\">\n"; my %shipto_md5; find( sub { return unless ( /[.]xml\z/i and -f ); extract_information(); return; }, @ARGV ); print $allxml "</shiporder>\n"; sub extract_information { my $path = $_; if ( my $reader = XML::LibXML::Reader->new( location => $path )) { while ( $reader->nextElement( 'shipto' )) { my $elem = $reader->readOuterXml(); my $md5 = md5( $elem ); print $allxml $reader->readOuterXml() unless ( $shipto_md5 +{$md5}++ ); } } return; }
==> j1.xml <== <?xml version="1.0" encoding="UTF-8"?> <shiporder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <shipto> <name>johan</name> <address>Langgt 23</address> </shipto> </shiporder> ==> j2.xml <== <?xml version="1.0" encoding="UTF-8"?> <shiporder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <shipto> <name>benny</name> <address>galve 23</address> </shipto> </shiporder> ==> j3.xml <== <?xml version="1.0" encoding="UTF-8"?> <shiporder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <shipto> <name>kent</name> <address>vadrss 25</address> </shipto> </shiporder> ==> j4.xml <== <?xml version="1.0" encoding="UTF-8"?> <shiporder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <shipto> <name>benny</name> <address>galve 23</address> </shipto> </shiporder> ==> j5.xml <== <?xml version="1.0" encoding="UTF-8"?> <shiporder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <shipto> <name>stewart</name> <address>vadrss 25</address> </shipto> </shiporder>
<?xml version="1.0" encoding="UTF-8"?> <shiporder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <shipto> <name>johan</name> <address>Langgt 23</address> </shipto><shipto> <name>benny</name> <address>galve 23</address> </shipto><shipto> <name>kent</name> <address>vadrss 25</address> </shipto><shipto> <name>stewart</name> <address>vadrss 25</address> </shipto></shiporder>
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Multiple XML files from Directory to One XML file using perl.
by jyo (Initiate) on Nov 21, 2011 at 14:49 UTC | |
by graff (Chancellor) on Nov 27, 2011 at 04:24 UTC |