MrSnrub has asked for the wisdom of the Perl Monks concerning the following question:

Suppose I have a text file in the following format:

<item><key1>someValue</key1><key2>someValue</key2><key3>someValue</key +3><key4>someValue</key4></item><item><key1>someValue</key1><key2>some +Value</key2><key3>someValue</key3><key4>someValue</key4></item><item> +<key1>someValue</key1><key2>someValue</key2><key3>someValue</key3><ke +y4>someValue</key4></item> ... etc.

I need to run reports on this data (do a running total on some of the values; output some of these values to the screen as a table, etc.), and I assume the best way to do that is to turn it into an array of hashes: one item for each element of the array, and then the hashes will be the key names/values of each individual item. What is the best way to do this? I can read the data in like so:

my $filename = "/path/to/my/inputfile.txt"; my $filehandle; if (not open $filehandle, '<', $filename) { print "ERROR: Could not open file $filename\n"; return 0; } while (my $line = $filehandle) { chomp($line); # lots of stuff goes here!!! }

I guess my first step is to create an array by splitting on "</item><item>" right at the "><" point (so "</item>" goes to element n and "<item>" goes to n + 1). How do I do that?

Replies are listed 'Best First'.
Re: How do I create an array of hashes from an input text file?
by Anonymous Monk on Nov 10, 2011 at 22:42 UTC

    I guess my first step is to create an array by splitting on

    No, use any of XML::Simple, XML::Twig, XML::Rules, XML::LibXML ...

    example on using any of site:perlmonks.org "use XML::Simple", site:perlmonks.org "use XML::Twig", site:perlmonks.org "use XML::Rules", site:perlmonks.org "use XML::LibXML" ...

    use Data::Dumper; my $xml = <<'__XML__'; <junk> <item> <key1>someValue</key1> <key2>someValue</key2> <key3>someValue</key3> <key4>someValue</key4> </item> <item> <key1>someValue</key1> <key2>someValue</key2> <key3>someValue</key3> <key4>someValue</key4> </item> <item> <key1>someValue</key1> <key2>someValue</key2> <key3>someValue</key3> <key4>someValue</key4> </item> </junk> __XML__ use XML::Simple; print Dumper( XMLin( $xml ) ); __END__ $VAR1 = { 'item' => [ { 'key2' => 'someValue', 'key4' => 'someValue', 'key1' => 'someValue', 'key3' => 'someValue' }, { 'key2' => 'someValue', 'key4' => 'someValue', 'key1' => 'someValue', 'key3' => 'someValue' }, { 'key2' => 'someValue', 'key4' => 'someValue', 'key1' => 'someValue', 'key3' => 'someValue' } ] }; $
Re: How do I create an array of hashes from an input text file?
by Kc12349 (Monk) on Nov 10, 2011 at 23:30 UTC

    XML::Simple is probably the easiest place for you to start. Take a look at the ForceArray parameter to get a more consistent data structure. Once you have a data structure, dump it with Data::Dumper or the like to see what you're working with.

    use XML::Simple; use Data::Dumper; my $xml = '<xml><item><key1>someValue</key1><key2>someValue</key2><key +3>someValue</key3><key4>someValue</key4></item><item><key1>someValue< +/key1><key2>someValue</key2><key3>someValue</key3><key4>someValue</ke +y4></item></xml>'; my $data = XMLin($xml, ForceArray => 1); print Dumper($data);
Re: How do I create an array of hashes from an input text file?
by mrstlee (Beadle) on Nov 11, 2011 at 08:05 UTC
    XML::Simple or similar is undoubtedly the laziest path but if you do actually want to get your hands dirty you could do something like:
    use feature qw(say); use Data::Dumper; my $data =<<'XML-type'; <item><key1>someValue</key1><key2>someValue</key2><key3>someValue</key +3> <key4>someValue</key4></item><item><key1>someValue</key1><key2>someVal +ue </key2><key3>someValue</key3><key4>someValue</key4></item><item> <key1>someValue</key1><key2>someValue</key2><key3>someValue</key3><key +4>someValue</key4></item> XML-type my @processed; local $/ = '</item>'; open STRH , '<', \($data); my @items = <STRH>; close STRH; @processed = map { my $item =$_; $item =~ s/<.?item>//g; push @processed, {$item =~ m { <([^>]+)>([^<]+)< }gx}; } @items; print Dumper \@processed;
    Prints:
    $VAR1 = [ { 'key2' => 'someValue', 'key4' => 'someValue', 'key1' => 'someValue', 'key3' => 'someValue' }, { 'key2' => 'someValue ', 'key4' => 'someValue', 'key1' => 'someValue', 'key3' => 'someValue' }, { 'key2' => 'someValue', 'key4' => 'someValue', 'key1' => 'someValue', 'key3' => 'someValue' }, {} ];
    For bonus points work out why there is an empty hash at the end.(I haven't got time just now!)
    Have fun
      I tried calling xmlin using XML::Simple and that almost does everything I want. Thanks for your help. One question: Suppose my XML file is really really big, and I only want to add data that meets a certain criteria (say, where the value of key2 is "Joe"). How do I filter that input?

        For very large files XML::Simple is probably not a good route. It will require you to load the entire XML data structure into memory.

        Should you see performance issues, you should take a look at XML::LibXML which is much more powerful. It offers and interface to DOM and SAX parsers. In particular, SAX based parsing may be the best choice if memory becomes an issue as it is event based as opposed to data structure based.

        SAX will offer more in the way of memory management while DOM will offer more speed at the price of a larger footprint.

        If you want to stick with a XML::Simple style interface, but just gain some speed, you can take a look at XML::Bare which is written in XS and among the fastest in terms of runtime. It does have a few less niceties that XML::Simple, but offers an option to create the same style of data structures.