librarat has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monastery Dwellers,

I am a new Perl convert, still trying to master the basics of the language. Coming from Python, the concept of object references has been giving me a bit of trouble and as such, I've come to ask for some clarification.

I've started out by reading an XML file using XML::Simple, but have been urged to move into XML::LibXML.

My goal is to iterate through the loaded hash and remove key/value pairs from each hash within the array where key names are listed in a pre-defined array.

This is a sample of the XML that I'm loading

<?xml version='1.0' encoding='UTF-8' standalone='yes' ?> <smses count="4110"> <sms protocol="0" address="1234567890" date="1288032888762" type="1" + subject="null" body="Hey, I'm out of class." toa="null" sc_toa="null +" service_center="null" read="1" status="-1" locked="0" /> <sms protocol="0" address="0987654321" date="1288032888762" type="1" + subject="null" body="What are you up to this afternoon?" toa="null" +sc_toa="null" service_center="null" read="1" status="-1" locked="0" / +> </smses>

The array (via Dumper) looks like:

#$VAR1 = { # 'count' => '5509', # 'sms' => [ # { # 'protocol' => '0', # 'locked' => '0', # 'status' => '0', # 'date' => '1288194026703', # 'subject' => 'null', # 'toa' => 'null', # 'sc_toa' => 'null', # 'body' => 'You should come to dinner with us! :) +', # 'read' => '1', # 'address' => '(123) 456-7890', # 'type' => '2', # 'service_center' => 'null' # }, # { # 'protocol' => '0', # 'locked' => '0', # 'status' => '64', # 'date' => '1288224316833', # 'subject' => 'null', # 'toa' => 'null', # 'sc_toa' => 'null', # 'body' => 'INVADER ZIM!!', # 'read' => '1', # 'address' => '0987654321', # 'type' => '2', # 'service_center' => 'null' # },
Ignoring the actual values of the keys, the above XML loads into a hashref that looks like above Dumper'd output. What I'm trying to do is iterate through all of the Hash's in the sms array. Within each hash, all I'm trying to do is delete key pairs where the key nam does NOT reside in a pre-defined list (@keysToKeep).

#! /usr/bin/env perl use XML::Simple; use Data::Dumper; use strict; use warnings; ######## ########### Variables ######## # List of info we want for each message entry my @keysToKeep = ('date', 'body', 'addr +ess', 'type'); # XML Parser my $xml = new XML::Simple; ######## ########### Subroutines ######## # Filepath is passed to load a hashref # a CLEANED Hashref is returned of the xml we've loaded sub loadFileToHashRef { my $xmlRef = $xml->XMLin($_[0]); # Clean our hash for my $textRef ( @{$xmlRef->{sms}} ) { foreach my $key (keys %{$textRef}) { # If your $key is *not* what we're looking for +, remove it from the hash if ( !grep ($key, @keysToKeep) ) { delete $textRef->{$key}; } } } # Return our hash ref return $xmlRef; } my $hashRef = loadFileToHashRef("xml/sms-20110704030000.xml"); print Dumper $hashRef;
Now, this code doesn't remove elements from the hash and I know why. Using keys only returns a list of the pairs that I'm modifying. I'm not actually modifying the hashref. My question becomes this: How do I modify the hashref (%hashRef) directly?

Given that the above doesn't do what I'm trying to do, I started working with 'each'. Here's what I came up with.

#! /usr/bin/env perl use XML::Simple; use Data::Dumper; use strict; use warnings; use 5.012; ######## ########### Variables ######## # List of info we want for each message entry my @keysToKeep = ('date', 'body', 'addr +ess', 'type'); # XML Parser my $xml = new XML::Simple; ######## ########### Subroutines ######## # Filepath is passed to load a hashref # a CLEANED Hashref is returned of the xml we've loaded sub loadFileToHashRef { my $xmlRef = $xml->XMLin($_[0]); while (($key, $value) = each %{$xmlRef}) { print "Key: $key, Value: $value \n"; } # Return our hash ref return $xmlRef; } my $hashRef = loadFileToHashRef("xml/sms-20110704030000.xml"); print Dumper $hashRef;

However, I continually get the following, and I don't understand why.

Global symbol "$key" requires explicit package name at ./parse.pl line + 29. Global symbol "$value" requires explicit package name at ./parse.pl li +ne 29. Global symbol "$key" requires explicit package name at ./parse.pl line + 31. Global symbol "$value" requires explicit package name at ./parse.pl li +ne 31. Execution of ./parse.pl aborted due to compilation errors.

I also now know that this won't do what I'm trying to do either as the each function doesn't allow for element removal from the hash.

I highly suspect that my PHP and Python ideologies are getting in the way of my understanding of how Perl handles Hashes and Arrays (as it pertains to references). Ideally, I'm looking for the following feedback

-- Libra

Replies are listed 'Best First'.
Re: Parsing an Array of Hashes, Modifying Key/Value Pairs
by LanX (Saint) on May 12, 2013 at 18:20 UTC
    > However, I continually get the following, and I don't understand why.
    Global symbol "$key" requires explicit package name at ./parse.pl line + 29. Global symbol "$value" requires explicit package name at ./parse.pl li +ne 29.

    you forgot my

    while (my ($key, $value) = each %{$xmlRef})

    edit

    For historic reasons package variables ("global symbols") are the default in Perl, lexical variables were introduced later.

    Under strict it's necessary to declare explicitly with our or my which kind of variable you want ¹)

    This has the advantage over Python's implicit declaration to catch typos more easily.

    Cheers Rolf

    ( addicted to the Perl Programming Language)

    update
    ¹) or to provide an "explicit package name": $package::name works w/o our
      Many thanks! This make much more sense now that I'm thinking about it. Strict is fast becoming my best friend :)
Re: Parsing an Array of Hashes, Modifying Key/Value Pairs
by hdb (Monsignor) on May 12, 2013 at 18:33 UTC

    You could also more actively "keep" the desired fields.

    sub loadFileToHashRef { my $xmlRef = $xml->XMLin($_[0]); # Clean our hash for my $textRef ( @{$xmlRef->{sms}} ) { %{$textRef} = map { $_ => $textRef->{$_} } @keysToKeep; } # Return our hash ref return $xmlRef; }

    If they didn't exist in the XML, you will get some undefs.

Re: Parsing an Array of Hashes, Modifying Key/Value Pairs
by LanX (Saint) on May 12, 2013 at 18:36 UTC
    # If your $key is *not* what we're looking for, remove i +t from the hash if ( !grep ($key, @keysToKeep) )

    well what you want something like Python's in operator, but the Perl way to check inclusion within a set is to use hashes.

    so if ($keysToKeep{$key}) would do it most efficiently.

    There is a new smart-match operator ~~ for in, but thats still somehow experimental

    DB<137> 1 ~~ [1,2,3] => 1 DB<138> "a" ~~ [1,2,3] => "" DB<139> "4" ~~ [1,2,3] => "" DB<140> "2" ~~ [1,2,3] => 1

    edit

    Anyway using a hash-slice is the fastest way to filter keys:

    DB<145> %h=(a=>1,b=>2,c=>3) => ("a", 1, "b", 2, "c", 3) DB<146> @keysToDelete=qw/b c d/ => ("b", "c", "d") DB<147> delete @h{@keysToDelete} => (2, 3, undef) DB<148> \%h => { a => 1 }

    update

    or

    DB<153> %h=(a=>1,b=>2,c=>3) => ("a", 1, "b", 2, "c", 3) DB<154> @keysToKeep=qw/a e/ => ("a", "e") DB<155> %h2=() DB<156> @h2{@keysToKeep}=@h{@keysToKeep} => (1, undef) DB<157> \%h2 => { a => 1, e => undef }

    but please notice how key "e" jumped into existence now, which is no problem if you use a guarantied subset.

    Cheers Rolf

    ( addicted to the Perl Programming Language)

Re: Parsing an Array of Hashes, Modifying Key/Value Pairs
by NetWallah (Canon) on May 12, 2013 at 18:35 UTC
    This is just a hunch, and I'm too lazy to benchmark it to prove it, but I think you may be better off simply copying the key-value pairs you want to KEEP, to a different hashref, as opposed to attempting to delete the ones you don't want.

    This would be easier programatically, as well as potentially more cpu-efficient with a negligible memory cost.

    BTW - welcome to the Monastery - and, that is an excellent first post (++)!

                 "I'm fairly sure if they took porn off the Internet, there'd only be one website left, and it'd be called 'Bring Back the Porn!'"
            -- Dr. Cox, Scrubs

      NetWallah,

      Thank you for the tips! I've managed to dig through a bit further and make more sense of what was confusing me. I wound up benching your suggestion against removing key pairs and you were right. Marginally more memory usage, but pretty well moot.

      I've got two follow-up questions though

      1. Can you provide me something (link, book, quick write-up, etc) about how to tell while you're coding whether or not you're working with a reference? Or is that just something I'll pick up as I write more code?
      2. What's the difference between (as seen on line 33 in the code below)
        • ${$smsHash->{date}}
        • %smsHash['date']

      Minus the error I'm getting, this is the solution to my initial query.

      As I'm still uncertain about strict references, the solution to this doesn't jump out at me (as I *think* I'm pretty strict about what data I want and in which order) ☺

      Can't use string ("1288032888762") as a SCALAR ref while "strict refs" in use at ./parse.pl line 33.

      Cheers!

      -- Libra

        > Can you provide me something (link, book, quick write-up, etc) about how to tell while you're coding whether or not you're working with a reference? Or is that just something I'll pick up as I write more code?

        see perlref and ref

        Perl uses sigils and (de)reference operators to show if your dealing with a ref.

        The default for @arrays and %hashes is what I call their "list form", i.e. you always copy the content when assigning them:

        DB<100> @a=(1,2,3) => (1, 2, 3) DB<101> @b=@a => (1, 2, 3) DB<102> $b[0]=42 => 42 DB<103> $a[0] # @a didn't change => 1

        as you noticed $a[0] accesses the first element of @a, you need $ because $a[0] is a scalar and not an array.

        But the behavior from Python is to default to the "reference form" for arrays and hashes and to the non-reference form for simple variables:

        >>> a=[1,2,3] >>> b=a >>> b[0]=42 >>> a[0] 42

        to achieve this behavior (e.g. to be able to nest data) you need references in Perl and refs are $scalars.

        DB<104> $a=[1,2,3] => [1, 2, 3] DB<105> $b=$a => [1, 2, 3] DB<106> $b->[0]=42 => 42 DB<107> $a->[0] # original changed => 42

        you need an explicit dereference operator -> to distinguish between the array @a and the referenced array in $a (yes they are different) ¹.

        There is also a reference operator \ to get the reference of a variable.

        This way the above example can be written

        DB<108> \@a => [1, 2, 3] DB<109> $b=\@a => [1, 2, 3] DB<110> $b->[0]=666 => 666 DB<111> $a[0] # original changed => 666

        There is an alternative way to dereference by prepending the intended sigil

        DB<112> @$b => (666, 2, 3) # now a (list) not an [array-ref]

        but you need to careful about precedence when dealing with nested structures.

        There is more to say, but this should be enough for the beginning.

        Please ask if you have further questions.

        Cheers Rolf

        ( addicted to the Perl Programming Language)

        1) There are no special sigils to denote a array_ref or a hash_ref in Perl. Which is unfortunate IMO cause one could avoid deref-operators this way! But the keyboard is already exhausted and things like €£¢ are difficult to type. I'm using a naming convention to mark them, e.g. $a_names for array_ref of names.

        I think the syntax you are looking for is:
        push @textsToKeep, [ $smsHash->{date}, $smsHash->{body}, $smsHash->{a +ddress} , $smsHash->{type} ]; # Adding ONE entry, which +is an array-ref.
        This makes @textsToKeep an AOA (Array of Array's), which , technically, is an Array of Array-refs.

        You can access individual items as follows:

        my $body_for_third_item = $textsToKeep[2][1]; # Indexes start a 0 my $address_for_second_item = $textsToKeep[1][2];

                     "I'm fairly sure if they took porn off the Internet, there'd only be one website left, and it'd be called 'Bring Back the Porn!'"
                -- Dr. Cox, Scrubs

Re: Parsing an Array of Hashes, Modifying Key/Value Pairs
by kcott (Archbishop) on May 13, 2013 at 05:46 UTC

    G'day librarat,

    "My goal is to iterate through the loaded hash and remove key/value pairs from each hash within the array where key names are listed in a pre-defined array."

    I'd probably tackle that like this:

    perl -Mstrict -Mwarnings -e ' use Data::Dumper; my @keeps = qw{a b}; my %to_keep = map { $_ => 1 } @keeps; my $data = { count => 5509, sms => [{a => 1, b => 2, c => 3, d=>4}, {b => 2, c => 3, e => +5}] }; print Dumper $data; for my $sms_ref (@{$data->{sms}}) { delete @$sms_ref{ grep { ! $to_keep{$_} } keys %$sms_ref }; } print Dumper $data; ' $VAR1 = { 'count' => 5509, 'sms' => [ { 'c' => 3, 'a' => 1, 'b' => 2, 'd' => 4 }, { 'e' => 5, 'c' => 3, 'b' => 2 } ] }; $VAR1 = { 'count' => 5509, 'sms' => [ { 'a' => 1, 'b' => 2 }, { 'b' => 2 } ] };

    Update: Removed the exists from ... grep { ! exists $to_keep{$_} } .... It was redundant - see discussion below. Thanks ++LanX.

    -- Ken

      G'day kcott,

      just wondering:

      my @keeps = qw{a b}; my %to_keep = map { $_ => 1 } @keeps; ... delete @$sms_ref{ grep { ! exists $to_keep{$_} } keys %$sms_re +f };
      using exists is a bit redundant if you already intialize the hash with true values...

      so either dropping the exists

      ... delete @$sms_ref{ grep { ! $to_keep{$_} } keys %$sms_ref };
      or initializing with undef
      my %to_keep; @to_keep{ qw{a b} } = (); ...

      should have the same effect.

      Or am I missing something?

      Cheers Rolf

      ( addicted to the Perl Programming Language)

        Thanks for picking that up - I've updated my node. I suspect my intent, with including the exists, was to avoid autovivification; however, no autovivification occurs there anyway — I think it might have done long ago (Perl3/4?).

        -- Ken

Re: Parsing an Array of Hashes, Modifying Key/Value Pairs
by hdb (Monsignor) on May 12, 2013 at 18:22 UTC

    Your usage of grep is incorrect. In your first script, use

    if ( !grep(/$key/, @keysToKeep) )

    to make it work.

      > if ( !grep(/$key/, @keysToKeep) )

      Careful! Without anchors you'll also match on substrings, so better /^$key$/.

      Anyway try to avoid regexes when possible and use eq

      if ( not grep { $key eq $_ }  @keysToKeep) )

      Cheers Rolf

      ( addicted to the Perl Programming Language)

      edit

      corrected code

        Good catch!
        When I'm playing with regex, I use GSkinners test suite (as I'm without a doubt, no wizard)
        The regex fail in the OP was very much glassed over by me as I knew it wasn't doing anything that was getting me towards my end goal.
        http://gskinner.com/RegExr/

        -- Libra
Re: Parsing an Array of Hashes, Modifying Key/Value Pairs
by Laurent_R (Canon) on May 12, 2013 at 23:10 UTC

    Hi,

    I am a new Perl convert, still trying to master the basics of the language. Coming from Python,

    I went through the same transition from Python to Perl nine or ten years ago. Even though I would no longer be able to write Python code very much beyond the 'hello word' example, I still think that Python was (and is) a great language. And I would continue to disagree with anyone telling me that the Python indent thing is improper. My experience tells me that it is quite good, not to say great.

    >p>But, of course, this is only a secundary aspect.

    I remember that, at the time, Python was supposed to be simple, and Perl was supposed to be complicated and intimadating. As a proof, the O'Reilly CD-ROM featured half a dozen book on Perl. Well, if you need so many books to master it, it must be too complicated, no? The truth is that having many books from different authors gives you many difrent perspectives. And this is damn useful.

    Well, I could go on, but there is no point... Exceot for saying that, even though I gave up Python, it is what was my farorite programming language until I discovered Perl.