in reply to I don't understand why I'm getting an "Use of uninitialized value" error

The problem you asked about has been solved by jwkrahn and ikegami above, and GrandFather has recommended his favorite XML parsing method. That leaves it to me to point out two other things:
  1. Your algorithm is inefficient to the point of being somewhat embarrassing: given 8 tags of interest, you read every line of every file 8 times, looking for each tag in turn. That does not scale well when dealing with more tags and/or more (and/or larger) files.
  2. If you think XML::Twig's mile-long manual is overkill, I agree, but you should still be using some sort of XML parsing approach, because That's The Right Way To Do It (and There's More Than One Way To Do It with XML parsing).

Here's a simple way, which I've tested on a directory containing a couple of XML files that probably have enough in common with the ones you have. Reading the first 250 lines of the XML::Parser manual was sufficient to know how to write this:

#!/usr/bin/perl use strict; use warnings; use XML::Parser; my ( $tagname, $tagtext ); # "globals" used in callback subs my @target_tags = qw(ID TimeStamp IP_Address Title Complainant Contact Address Email); my $target_regex = join( '|', @target_tags ); my $parser = XML::Parser->new( Handlers => { Start => \&get_tagname, Char => \&get_tagtext, End => \&print_tagdata, } ); for my $xmlfile ( <*.xml> ) { $parser->parsefile( $xmlfile ); } sub get_tagname { $tagname = $_[1]; $tagtext = ''; } sub get_tagtext { $tagtext .= $_[1]; } sub print_tagdata { if ( $_[1] =~ /$target_regex/ ) { print "$_[1] = $tagtext\n"; } }
(updated to fix typo in target_tags list)

One notable difference between this and the OP code is that this will print tag labels and their contents in the order in which they occur in the XML files. If that's okay, then there's nothing more to worry about.

(But if you need to control tag order and it varies from one xml file to the next, you just need to add a global hash for storing tag values, then print the hash contents in the desired order after parsing each file. -- update: and don't forget to assign "()" to the hash, i.e. empty it, before parsing each file.)

XML::Parser is the surprisingly simple foundation on which many "higher-level" parsing modules are built. I'm actually surprised at how many CPAN modules have been created that are layers around XML::Parser, considering how easy and efficient this module is.

For relatively simple tasks like yours, the logic involved in using XML::Parser is pretty trivial, and when you use it, you really save a lot of effort, and end up with code that is simpler, more coherent, more robust, and easier to maintain.

  • Comment on Re: I don't understand why I'm getting an "Use of uninitialized value" error
  • Download Code