comment on

The following comments do not represent the "consensus" view among responsible monks at the Monastery. But they are in the spirit of "TIMTOWTDI"...

Sometimes, an XML job is really simple, for instance when the job is to read XML data created by some task-specific program that does nothing but put tags around the columns of a particular flat table -- which appears to be what you have in this case. In effect, if you had access to the original flat table (wherever/whatever it may be) before its contents were decorated with XML tags, you wouldn't need to "parse" XML at all; you would just read the table.

And sometimes, if the XML module(s) you would like are not installed for the perl interpreter you're using (e.g. on a web server that you don't control), it can be... um, a bit complicated or time consuming to get them installed, or to incorporate one of them into your own script. But if you know that the job is just a matter of stripping tags out of XML-ized flat table, (warning: heresy alert (: ) you probably don't need an XML parser for that.

You could read the input like this (not tested):

my @tags = qw/NAME LOCATION TIME DATE PRIORITY ATTENDEES DESCRIPTION/;

my @events;

open( XML, "<datafile.xml" ) or die $!;
{
   local $/ = "</EVENT>";  # input record separator is end-tag

   while (<XML>)   # read one whole <EVENT>...</EVENT> into $_
   {
      my %record = ();
      for my $t ( @tags )
      {
         if ( m{<$t>\s*([^<]+)} ) # capture text following an open tag
         {
            $record{$t} = $1;
            $record{$t} =~ s/\s+$//; # optional: remove trailing space
+s
         }
      }
      push @events, { %record }; # @events is an array of hashes
   }
}
close XML;

# to get back to the data for later use:

for my $i ( 0 .. $#events ) {
    my $href = $events[$i]; # you get a reference to the hash
    my %rec_hash = %$href;  # you can make a local copy of it, or
    print "Event #", $i+1, ":\n";
    print " $_ = $$href{$_}\n" for ( keys %$href ); # just use the has
+h ref
}
[download]

Now for the caveats... Your XML data is not simple (and this kind of simple solution will not work) if the input is not really like a flat table. This would be the case if:

an event can have two or more instances of a given tag (e.g. multiple descriptions)
a given tag within an event can contain optional or variable nested tags (e.g. if "attendees" included XML-tagged sub-categories like "invited" vs. "present")
any of the tags can take optional or variable attributes (e.g. <TIME zone="EST">...)

If your input has any of these features, you could elaborate the "non-parser" approach to handle them, but you might soon reach the point of "diminishing returns", where it would have been better to start with an actual XML parsing module.

In reply to Re: XML Parsing by graff
in thread XML Parsing by JoeJaz

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.