The problem you asked about has been solved by jwkrahn and ikegami above, and GrandFather has recommended his favorite XML parsing method. That leaves it to me to point out two other things:
  1. Your algorithm is inefficient to the point of being somewhat embarrassing: given 8 tags of interest, you read every line of every file 8 times, looking for each tag in turn. That does not scale well when dealing with more tags and/or more (and/or larger) files.
  2. If you think XML::Twig's mile-long manual is overkill, I agree, but you should still be using some sort of XML parsing approach, because That's The Right Way To Do It (and There's More Than One Way To Do It with XML parsing).

Here's a simple way, which I've tested on a directory containing a couple of XML files that probably have enough in common with the ones you have. Reading the first 250 lines of the XML::Parser manual was sufficient to know how to write this:

#!/usr/bin/perl use strict; use warnings; use XML::Parser; my ( $tagname, $tagtext ); # "globals" used in callback subs my @target_tags = qw(ID TimeStamp IP_Address Title Complainant Contact Address Email); my $target_regex = join( '|', @target_tags ); my $parser = XML::Parser->new( Handlers => { Start => \&get_tagname, Char => \&get_tagtext, End => \&print_tagdata, } ); for my $xmlfile ( <*.xml> ) { $parser->parsefile( $xmlfile ); } sub get_tagname { $tagname = $_[1]; $tagtext = ''; } sub get_tagtext { $tagtext .= $_[1]; } sub print_tagdata { if ( $_[1] =~ /$target_regex/ ) { print "$_[1] = $tagtext\n"; } }
(updated to fix typo in target_tags list)

One notable difference between this and the OP code is that this will print tag labels and their contents in the order in which they occur in the XML files. If that's okay, then there's nothing more to worry about.

(But if you need to control tag order and it varies from one xml file to the next, you just need to add a global hash for storing tag values, then print the hash contents in the desired order after parsing each file. -- update: and don't forget to assign "()" to the hash, i.e. empty it, before parsing each file.)

XML::Parser is the surprisingly simple foundation on which many "higher-level" parsing modules are built. I'm actually surprised at how many CPAN modules have been created that are layers around XML::Parser, considering how easy and efficient this module is.

For relatively simple tasks like yours, the logic involved in using XML::Parser is pretty trivial, and when you use it, you really save a lot of effort, and end up with code that is simpler, more coherent, more robust, and easier to maintain.


In reply to Re: I don't understand why I'm getting an "Use of uninitialized value" error by graff
in thread I don't understand why I'm getting an "Use of uninitialized value" error by TheBigAmbulance

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.