diomedea has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,
How do I parse multiple xml files with xml::twig?
I am sure I'm missing something really obvious here but i have read and reread the twig documentation, googled and perlmonked and I cant discover why when I run this only file2.xml gets parsed.
How can I 'reset' the twig to parse the next file? has anyone else experience this sort of problem and if so how did you solve it? thanks,;+)
@xmlfiles = qw(file1.xml file2.xml); foreach my $xfile (@xmlfiles) { $twig = new XML::Twig( twig_roots => { 'pdb_entry/pdb_code' => \&pdb_entry, 'pdb_entry/total_asm' => \&pdb_entry, 'pdb_entry/asm_set/assembly' => \&assembly,}); &parse_twig($xfile); } sub parse_twig { my $file = shift; if ($twig->safe_parsefile($file) == 0) { die "Failed to parse $file: $@"; } my $root = $twig->root; my @params = $root->children('pdb_entry'); $twig->purge; $twig->dispose; } ... handler code here

Replies are listed 'Best First'.
Re: parsing multiple xml files with xml::twig
by citromatik (Curate) on Jun 07, 2007 at 16:19 UTC

    Did you "use strict" and "use warnings"?

    You are defining the variable $twig in the foreach loop, but trying to access it in the "parse_twig" sub (i.e. out of scope!).

    You will save (thanx Jenda ;) a lot of time if you use ALWAYS the "use strict" and "use warnings" pragmas in your scripts.

    citromatik

      thanks, yes i did have use strict and warnings enabled, Im not sure its a scoping issue as the $twig is a global and it does run without giving errors, just not the results I expected!
      if I put all the sub code into the foreach loop I still get the same problem, namely that the last file is the only one which is apparently parsed: I missed some bits of code out before as I didnt think they were relevant and was trying to avoid inflating the post with unimportant details.. heres the fuller version:
      #! /usr/bin/perl -wT use strict; use warnings; use Fcntl; use Getopt::Long; use POSIX; use XML::Twig; ### UNTAINT ENVIRONMENT, SUPPLIED FILENAMES #code to untaint env, filames here my $twig; my ($xmlfile, $ctlfile, $config); my (@assm_header, @chain_header, @xmlfiles); my %assm_header = (); my %assm_line = (); my %pdb_line = (); my %chains = (); my %chain_header = (); ### HANDLE COMMAND LINE OPTIONS # code to handle Getopt::long here @xmlfiles = qw(file1.xml file2.xml); foreach my $xfile (@xmlfiles) { $twig = new XML::Twig( twig_roots => { 'pdb_entry/pdb_code' => \&pdb_entry, 'pdb_entry/total_asm' => \&pdb_entry, 'pdb_entry/asm_set/assembly' => \&assembly,}); # &parse_twig($xfile); # sub code now inserted below ## sub code folded into here if ($twig->safe_parsefile($xfile) == 0) { die "Failed to parse $xfile: $@"; } my $root = $twig->root; my @params = $root->children('pdb_entry'); $twig->purge; $twig->dispose; } ... handler code

      result is file2 gets parsed and output, no sign of file1.. ;+(

      I have managed to get a workround by calling the parser from another program which feeds it one file at a time:
      #! /usr/bin/perl use strict; use warnings; use Fcntl; use Getopt::Long; use POSIX; my ($xmlfile, $ctlfile, $config); my @xmlfiles; ### HANDLE COMMAND LINE OPTIONS GetOptions('config|c=s' => \$config, 'ctlfile|ctl=s' => \$ctlfile, 'xmlfile|x=s' => \$xmlfile,); if ($xmlfile) # xmlfile specified on cmd line { ($xmlfile = $1) = $xmlfile =~ /^(\w+\.*\w+)$/i; push @xmlfiles, $xmlfile; } elsif ($config) # xmlfiles in config file { sysopen(XMLFILES, $config, O_RDONLY) or die "Can't open $config : $!"; my @config = <XMLFILES>; foreach my $line (@config) { next if (($line =~ /^#+/) || ($line =~ /^\s+$/)) ; chomp($line); push @xmlfiles, $line; } close XMLFILES; } else # try default filename then bale out { $xmlfile = "multimers.pisa", if (-e "multimers.pisa" and -s "multi +mers.pisa"); if ($xmlfile) { print "Using default xml file $xmlfile\n"; ($xmlfile = $1) = $xmlfile =~ /^(\w+\.*\w+)$/i; push @xmlfiles, $xmlfile; } else { die "No valid xmlfile specified...exiting\n"; } } foreach my $file (@xmlfiles) { system "xml_pisa.pl $file"; # this calls the original + # twig parsing script }