Parsing an Array of Hashes, Modifying Key/Value Pairs

librarat has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monastery Dwellers,

I am a new Perl convert, still trying to master the basics of the language. Coming from Python, the concept of object references has been giving me a bit of trouble and as such, I've come to ask for some clarification.

I've started out by reading an XML file using XML::Simple, but have been urged to move into XML::LibXML.

My goal is to iterate through the loaded hash and remove key/value pairs from each hash within the array where key names are listed in a pre-defined array.

This is a sample of the XML that I'm loading

<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>
<smses count="4110">
  <sms protocol="0" address="1234567890" date="1288032888762" type="1"
+ subject="null" body="Hey, I'm out of class." toa="null" sc_toa="null
+" service_center="null" read="1" status="-1" locked="0" />
  <sms protocol="0" address="0987654321" date="1288032888762" type="1"
+ subject="null" body="What are you up to this afternoon?" toa="null" 
+sc_toa="null" service_center="null" read="1" status="-1" locked="0" /
+>
</smses>
[download]

The array (via Dumper) looks like:

#$VAR1 = {
#          'count' => '5509',
#          'sms' => [
#                   {
#                     'protocol' => '0',
#                     'locked' => '0',
#                     'status' => '0',
#                     'date' => '1288194026703',
#                     'subject' => 'null',
#                     'toa' => 'null',
#                     'sc_toa' => 'null',
#                     'body' => 'You should come to dinner with us! :)
+',
#                     'read' => '1',
#                     'address' => '(123) 456-7890',
#                     'type' => '2',
#                     'service_center' => 'null'
#                   },
#                   {
#                     'protocol' => '0',
#                     'locked' => '0',
#                     'status' => '64',
#                     'date' => '1288224316833',
#                     'subject' => 'null',
#                     'toa' => 'null',
#                     'sc_toa' => 'null',
#                     'body' => 'INVADER ZIM!!',
#                     'read' => '1',
#                     'address' => '0987654321',
#                     'type' => '2',
#                     'service_center' => 'null'
#                   },
[download]

Ignoring the actual values of the keys, the above XML loads into a hashref that looks like above Dumper'd output. What I'm trying to do is iterate through all of the Hash's in the sms array. Within each hash, all I'm trying to do is delete key pairs where the key nam does NOT reside in a pre-defined list (@keysToKeep).

#! /usr/bin/env perl

use XML::Simple;
use Data::Dumper;
use strict;
use warnings;

########
###########  Variables
########

# List of info we want for each message entry
my @keysToKeep                  =               ('date', 'body', 'addr
+ess', 'type');

# XML Parser
my $xml                         =               new XML::Simple;


########
###########  Subroutines
########

# Filepath is passed to load a hashref
# a CLEANED Hashref is returned of the xml we've loaded
sub loadFileToHashRef
{
        my $xmlRef      =       $xml->XMLin($_[0]);

        # Clean our hash
        for my $textRef ( @{$xmlRef->{sms}} )
        {   
                foreach my $key (keys %{$textRef})
                {   
                        # If your $key is *not* what we're looking for
+, remove it from the hash
                        if ( !grep ($key, @keysToKeep) )
                        {   
                                delete $textRef->{$key};
                        }   
                }   
        }   

        # Return our hash ref
        return $xmlRef;
}

my $hashRef = loadFileToHashRef("xml/sms-20110704030000.xml");
print Dumper $hashRef;
[download]

Now, this code doesn't remove elements from the hash and I know why. Using keys only returns a list of the pairs that I'm modifying. I'm not actually modifying the hashref. My question becomes this: How do I modify the hashref (%hashRef) directly?

Given that the above doesn't do what I'm trying to do, I started working with 'each'. Here's what I came up with.

#! /usr/bin/env perl

use XML::Simple;
use Data::Dumper;
use strict;
use warnings;
use 5.012;

########
###########  Variables
########
# List of info we want for each message entry
my @keysToKeep                  =               ('date', 'body', 'addr
+ess', 'type');

# XML Parser
my $xml                         =               new XML::Simple;


########
###########  Subroutines
########

# Filepath is passed to load a hashref
# a CLEANED Hashref is returned of the xml we've loaded
sub loadFileToHashRef
{
        my $xmlRef      =       $xml->XMLin($_[0]);

        while (($key, $value) = each %{$xmlRef})
        {   
                print "Key: $key, Value: $value \n"; 
        }   

        # Return our hash ref
        return $xmlRef;
}

my $hashRef = loadFileToHashRef("xml/sms-20110704030000.xml");
print Dumper $hashRef;
[download]

However, I continually get the following, and I don't understand why.

Global symbol "$key" requires explicit package name at ./parse.pl line
+ 29.
Global symbol "$value" requires explicit package name at ./parse.pl li
+ne 29.
Global symbol "$key" requires explicit package name at ./parse.pl line
+ 31.
Global symbol "$value" requires explicit package name at ./parse.pl li
+ne 31.
Execution of ./parse.pl aborted due to compilation errors.
[download]

I also now know that this won't do what I'm trying to do either as the each function doesn't allow for element removal from the hash.

I highly suspect that my PHP and Python ideologies are getting in the way of my understanding of how Perl handles Hashes and Arrays (as it pertains to references). Ideally, I'm looking for the following feedback

The "right" way to accomplish what I'm trying to do (I realize this is subjective to a degree)
How do I accomplish this in the most resource efficient way possible?
How to tell when you'll be working with a reference or a literal (eg: SimpleXML returns a reference, but I assume other Libraries/Functions wont?)

-- Libra

Comment on Parsing an Array of Hashes, Modifying Key/Value Pairs Select or Download Code

Replies are listed 'Best First'.
Re: Parsing an Array of Hashes, Modifying Key/Value Pairs by LanX (Saint) on May 12, 2013 at 18:20 UTC
> However, I continually get the following, and I don't understand why. `Global symbol "$key" requires explicit package name at ./parse.pl line + 29. Global symbol "$value" requires explicit package name at ./parse.pl li +ne 29.` [download] you forgot `my` `while (my ($key, $value) = each %{$xmlRef})` edit For historic reasons package variables ("global symbols") are the default in Perl, lexical variables were introduced later. Under `strict` it's necessary to declare explicitly with `our` or `my` which kind of variable you want š) This has the advantage over Python's implicit declaration to catch typos more easily. Cheers Rolf ( addicted to the Perl Programming Language) update š) or to provide an "explicit package name": `$package::name` works w/o `our`	[reply] [d/l] [select]
Re^2: Parsing an Array of Hashes, Modifying Key/Value Pairs by librarat (Novice) on May 12, 2013 at 21:13 UTC
Many thanks! This make much more sense now that I'm thinking about it. Strict is fast becoming my best friend :)	[reply]
Re: Parsing an Array of Hashes, Modifying Key/Value Pairs by hdb (Monsignor) on May 12, 2013 at 18:33 UTC
You could also more actively "keep" the desired fields. `sub loadFileToHashRef { my $xmlRef = $xml->XMLin($_[0]); # Clean our hash for my $textRef ( @{$xmlRef->{sms}} ) { %{$textRef} = map { $_ => $textRef->{$_} } @keysToKeep; } # Return our hash ref return $xmlRef; }` [download] If they didn't exist in the XML, you will get some undefs.	[reply] [d/l]
Re: Parsing an Array of Hashes, Modifying Key/Value Pairs by LanX (Saint) on May 12, 2013 at 18:36 UTC
`# If your $key is not what we're looking for, remove i +t from the hash if ( !grep ($key, @keysToKeep) )` [download] well what you want something like Python's `in` operator, but the Perl way to check inclusion within a set is to use hashes. so `if ($keysToKeep{$key})` would do it most efficiently. There is a new smart-match operator `~~` for `in`, but thats still somehow experimental `DB<137> 1 ~~ [1,2,3] => 1 DB<138> "a" ~~ [1,2,3] => "" DB<139> "4" ~~ [1,2,3] => "" DB<140> "2" ~~ [1,2,3] => 1` [download] edit Anyway using a hash-slice is the fastest way to filter keys: `DB<145> %h=(a=>1,b=>2,c=>3) => ("a", 1, "b", 2, "c", 3) DB<146> @keysToDelete=qw/b c d/ => ("b", "c", "d") DB<147> delete @h{@keysToDelete} => (2, 3, undef) DB<148> \%h => { a => 1 }` [download] update or `DB<153> %h=(a=>1,b=>2,c=>3) => ("a", 1, "b", 2, "c", 3) DB<154> @keysToKeep=qw/a e/ => ("a", "e") DB<155> %h2=() DB<156> @h2{@keysToKeep}=@h{@keysToKeep} => (1, undef) DB<157> \%h2 => { a => 1, e => undef }` [download] but please notice how key "e" jumped into existence now, which is no problem if you use a guarantied subset. Cheers Rolf ( addicted to the Perl Programming Language)	[reply] [d/l] [select]
Re: Parsing an Array of Hashes, Modifying Key/Value Pairs by NetWallah (Canon) on May 12, 2013 at 18:35 UTC
This is just a hunch, and I'm too lazy to benchmark it to prove it, but I think you may be better off simply copying the key-value pairs you want to KEEP, to a different hashref, as opposed to attempting to delete the ones you don't want. This would be easier programatically, as well as potentially more cpu-efficient with a negligible memory cost. BTW - welcome to the Monastery - and, that is an excellent first post (++)! "I'm fairly sure if they took porn off the Internet, there'd only be one website left, and it'd be called 'Bring Back the Porn!'" -- Dr. Cox, Scrubs	[reply]
Re^2: Parsing an Array of Hashes, Modifying Key/Value Pairs by librarat (Novice) on May 12, 2013 at 21:03 UTC
NetWallah, Thank you for the tips! I've managed to dig through a bit further and make more sense of what was confusing me. I wound up benching your suggestion against removing key pairs and you were right. Marginally more memory usage, but pretty well moot. I've got two follow-up questions though Can you provide me something (link, book, quick write-up, etc) about how to tell while you're coding whether or not you're working with a reference? Or is that just something I'll pick up as I write more code? What's the difference between (as seen on line 33 in the code below) `${$smsHash->{date}}` `%smsHash['date']` Minus the error I'm getting, this is the solution to my initial query. Read more... (1351 Bytes) As I'm still uncertain about strict references, the solution to this doesn't jump out at me (as I think I'm pretty strict about what data I want and in which order) ☺ `Can't use string ("1288032888762") as a SCALAR ref while "strict refs" in use at ./parse.pl line 33.` Cheers! -- Libra	[reply] [d/l] [select]
Re^3: Parsing an Array of Hashes, Modifying Key/Value Pairs (references) by LanX (Saint) on May 12, 2013 at 22:01 UTC
> Can you provide me something (link, book, quick write-up, etc) about how to tell while you're coding whether or not you're working with a reference? Or is that just something I'll pick up as I write more code? see perlref and ref Perl uses sigils and (de)reference operators to show if your dealing with a ref. The default for @arrays and %hashes is what I call their "list form", i.e. you always copy the content when assigning them: `DB<100> @a=(1,2,3) => (1, 2, 3) DB<101> @b=@a => (1, 2, 3) DB<102> $b[0]=42 => 42 DB<103> $a[0] # @a didn't change => 1` [download] as you noticed $a[0] accesses the first element of @a, you need $ because $a[0] is a scalar and not an array. But the behavior from Python is to default to the "reference form" for arrays and hashes and to the non-reference form for simple variables: `>>> a=[1,2,3] >>> b=a >>> b[0]=42 >>> a[0] 42` [download] to achieve this behavior (e.g. to be able to nest data) you need references in Perl and refs are $scalars. `DB<104> $a=[1,2,3] => [1, 2, 3] DB<105> $b=$a => [1, 2, 3] DB<106> $b->[0]=42 => 42 DB<107> $a->[0] # original changed => 42` [download] you need an explicit dereference operator `->` to distinguish between the array @a and the referenced array in $a (yes they are different) š. There is also a reference operator `\` to get the reference of a variable. This way the above example can be written `DB<108> \@a => [1, 2, 3] DB<109> $b=\@a => [1, 2, 3] DB<110> $b->[0]=666 => 666 DB<111> $a[0] # original changed => 666` [download] There is an alternative way to dereference by prepending the intended sigil `DB<112> @$b => (666, 2, 3) # now a (list) not an [array-ref]` [download] but you need to careful about precedence when dealing with nested structures. There is more to say, but this should be enough for the beginning. Please ask if you have further questions. Cheers Rolf ( addicted to the Perl Programming Language) 1) There are no special sigils to denote a array_ref or a hash_ref in Perl. Which is unfortunate IMO cause one could avoid deref-operators this way! But the keyboard is already exhausted and things like Ł˘ are difficult to type. I'm using a naming convention to mark them, e.g. `$a_names` for array_ref of names.	[reply] [d/l] [select]
Re^3: Parsing an Array of Hashes, Modifying Key/Value Pairs by NetWallah (Canon) on May 13, 2013 at 01:42 UTC
I think the syntax you are looking for is: `push @textsToKeep, [ $smsHash->{date}, $smsHash->{body}, $smsHash->{a +ddress} , $smsHash->{type} ]; # Adding ONE entry, which +is an array-ref.` [download] This makes @textsToKeep an AOA (Array of Array's), which , technically, is an Array of Array-refs. You can access individual items as follows: `my $body_for_third_item = $textsToKeep[2][1]; # Indexes start a 0 my $address_for_second_item = $textsToKeep[1][2];` [download] "I'm fairly sure if they took porn off the Internet, there'd only be one website left, and it'd be called 'Bring Back the Porn!'" -- Dr. Cox, Scrubs	[reply] [d/l] [select]
Re: Parsing an Array of Hashes, Modifying Key/Value Pairs by kcott (Archbishop) on May 13, 2013 at 05:46 UTC
G'day librarat, "My goal is to iterate through the loaded hash and remove key/value pairs from each hash within the array where key names are listed in a pre-defined array." I'd probably tackle that like this: perl -Mstrict -Mwarnings -e ' use Data::Dumper; my @keeps = qw{a b}; my %to_keep = map { $_ => 1 } @keeps; my $data = { count => 5509, sms => [{a => 1, b => 2, c => 3, d=>4}, {b => 2, c => 3, e => +5}] }; print Dumper $data; for my $sms_ref (@{$data->{sms}}) { delete @$sms_ref{ grep { ! $to_keep{$_} } keys %$sms_ref }; } print Dumper $data; ' $VAR1 = { 'count' => 5509, 'sms' => [ { 'c' => 3, 'a' => 1, 'b' => 2, 'd' => 4 }, { 'e' => 5, 'c' => 3, 'b' => 2 } ] }; $VAR1 = { 'count' => 5509, 'sms' => [ { 'a' => 1, 'b' => 2 }, { 'b' => 2 } ] }; [download] Update: Removed the `exists` from `... grep { ! exists $to_keep{$_} } ...`. It was redundant - see discussion below. Thanks ++LanX. -- Ken	[reply] [d/l] [select]
Re^2: Parsing an Array of Hashes, Modifying Key/Value Pairs by LanX (Saint) on May 13, 2013 at 09:42 UTC
G'day kcott, just wondering: `my @keeps = qw{a b}; my %to_keep = map { $_ => 1 } @keeps; ... delete @$sms_ref{ grep { ! exists $to_keep{$_} } keys %$sms_re +f };` [download] using `exists` is a bit redundant if you already intialize the hash with true values... so either dropping the `exists` `... delete @$sms_ref{ grep { ! $to_keep{$_} } keys %$sms_ref };` [download] or initializing with `undef` `my %to_keep; @to_keep{ qw{a b} } = (); ...` [download] should have the same effect. Or am I missing something? Cheers Rolf ( addicted to the Perl Programming Language)	[reply] [d/l] [select]
Re^3: Parsing an Array of Hashes, Modifying Key/Value Pairs by kcott (Archbishop) on May 14, 2013 at 04:11 UTC
Thanks for picking that up - I've updated my node. I suspect my intent, with including the `exists`, was to avoid autovivification; however, no autovivification occurs there anyway — I think it might have done long ago (Perl3/4?). -- Ken	[reply] [d/l]
Re: Parsing an Array of Hashes, Modifying Key/Value Pairs by hdb (Monsignor) on May 12, 2013 at 18:22 UTC
Your usage of `grep` is incorrect. In your first script, use `if ( !grep(/$key/, @keysToKeep) )` [download] to make it work.	[reply] [d/l] [select]
Re^2: Parsing an Array of Hashes, Modifying Key/Value Pairs by LanX (Saint) on May 12, 2013 at 18:50 UTC
> `if ( !grep(/$key/, @keysToKeep) )` Careful! Without anchors you'll also match on substrings, so better `/^$key$/`. Anyway try to avoid regexes when possible and use `eq` `if ( not grep { $key eq $_ } @keysToKeep) )` Cheers Rolf ( addicted to the Perl Programming Language) edit corrected code	[reply] [d/l] [select]
Re^3: Parsing an Array of Hashes, Modifying Key/Value Pairs by librarat (Novice) on May 12, 2013 at 21:11 UTC
Good catch! When I'm playing with regex, I use GSkinners test suite (as I'm without a doubt, no wizard) The regex fail in the OP was very much glassed over by me as I knew it wasn't doing anything that was getting me towards my end goal. http://gskinner.com/RegExr/ -- Libra	[reply]
Re: Parsing an Array of Hashes, Modifying Key/Value Pairs by Laurent_R (Canon) on May 12, 2013 at 23:10 UTC
Hi, I am a new Perl convert, still trying to master the basics of the language. Coming from Python, I went through the same transition from Python to Perl nine or ten years ago. Even though I would no longer be able to write Python code very much beyond the 'hello word' example, I still think that Python was (and is) a great language. And I would continue to disagree with anyone telling me that the Python indent thing is improper. My experience tells me that it is quite good, not to say great. >p>But, of course, this is only a secundary aspect. I remember that, at the time, Python was supposed to be simple, and Perl was supposed to be complicated and intimadating. As a proof, the O'Reilly CD-ROM featured half a dozen book on Perl. Well, if you need so many books to master it, it must be too complicated, no? The truth is that having many books from different authors gives you many difrent perspectives. And this is damn useful. Well, I could go on, but there is no point... Exceot for saying that, even though I gave up Python, it is what was my farorite programming language until I discovered Perl.	[reply]

edit

update

edit

update

edit