in reply to Re: Having an Issue updating hash of hashes
in thread Having an Issue updating hash of hashes

I would love a way to neaten up the splits to get the proper data for the fields! I did it this way because it works, but I definitely am not getting brownie points for beauty.
  • Comment on Re^2: Having an Issue updating hash of hashes

Replies are listed 'Best First'.
Re^3: Having an Issue updating hash of hashes
by Laurent_R (Canon) on Jul 05, 2014 at 19:07 UTC
    OK, one possible way:
    while (my $row = <LIST> ) { my ($id, $first, $last, $age) = (split /[\s=]/, $row)[1, 3, +5, 7]; push @people, { 'id' => "$id", 'first' => "$first", 'last' => "$last", 'age' => "$age" }; }
    It could be done in an even shorter way (one single instruction), but I do not think this would be a good idea, because it would become somewhat more difficult to understand and to maintain. Whereas I think the above remains fairly clear and quite easy to understand and to maintain. Using a regex could also do the job, but I doubt it could be clearer or simpler than the above.
      'id' => "$id",

      Quite frequently around the monastery of late, I've noticed this practice (idiom? tic?) of (apparently) needlessly interpolating a scalar into a string. I don't understand it. Is there any benefit to be had from it? Where does it originate?

      Update: When I originally posted this, I went looking for some examples of this 'frequent' practice and, of course, couldn't find any, got annoyed, gave up. Here are some recent examples: Perl function calls. and its cousin Perl : Convert a monolithic code to a function. They are both by grasshopper user786, but I'm sure he or she is not the only 'offender'. As you will see in the code of the linked posts, none of the interpolation has anything to do with avoidance of numification.
      Also: System output variables and newline:  print "$dcr"; statement at end;

        The only thing that springs to mind is to preserve any leading zeroes.

        You must always remember that the primary goal is to drain the swamp even when you are hip-deep in alligators.
        Yes, you are right, AnomalousMonk, for populating the inner hashes, I just blindly copied the OP's code, but AFAICS, quotes are not needed on either side of "fat commas" in the above hash definition, so that the relevant code could be changed to:
        push @people, { id => $id, first => $first, last => $last, age => $age, };
        They are not needed on the left side of "fat commas" (=>) because the fat comma automatically stringifies its left argument, and also not needed on the right side of the fat commas here because the values are already stored into variables. And, BTW, with respect to boftx's suggested explanation, preserving leading zeros doesn't seem to be a sufficient reason, as it appears that any leading zero(s) in these variable will be preserved and that Perl will not "numify" the hash values, as shown in this example under the Perl debugger:
        DB<13> $s = "003"; DB<14> %h = (string => $s); DB<15> x %h 0 'string' 1 003
      I completely agree with your stance on readability. I manage some code written by my predecessor. While he was very creative in writing short compact and efficient code, it can be very hard to follow along, if your just looking in. Here is the final result:
      #!/usr/bin/perl use Data::Dumper; use strict; use warnings; getPeople(); sub getPeople { my $file = 'list.txt'; my $people; open( LIST, "< $file" ) or die "Can't open $file : $!"; while (my $row = <LIST> ) { my ($id, $first, $last, $age) = (split /[\s=]/, $row)[ +1, 3, 5, 7]; $people -> { $id } = { id => $id, first => $first, last => $last, age => $age }; } print Dumper($people); print "The person with ID 3 is $people->{3}{first} $people->{3}{last}\ +n"; close LIST; }
        This seems to be fine, but I do not understand why you insist on using a reference to a hash of hashes. Why not simply a hash of hashes? Something like this:
        sub getPeople { my $file = 'list.txt'; my %people; open( LIST, "< $file" ) or die "Can't open $file : $!"; while (my $row = <LIST> ) { my ($id, $first, $last, $age) = (split /[\s=]/, $row)[1, 3, +5, 7]; $people{$id} = { id => $id, first => $first, last => $last, age => $age }; } print Dumper(\$people); print "The person with ID 3 is $people{3}{first} $people{3}{last} +\n"; close LIST; }
        An additional remark is that both your $people hashref and my %people hash are lexically scoped to the subroutine and not accessible outside the sub. I guess you will probably need to return it to the calling routine or share it one way or another with the caller if it needs to be used outside the sub.

        As a final note, I think you should compare your indentation with mine: I think that mine shows more clearly that the code after the end of the while loop still belongs to the subroutine definition. Taking the habit or properly indenting your code will save you a lot of debugging time when things get a bit complicated.

Re^3: Having an Issue updating hash of hashes
by AnomalousMonk (Archbishop) on Jul 06, 2014 at 00:07 UTC

    Contrary to Laurent_R's aversion to using a single regex to extract data fields from a record expressed herein, I find it's often both more robust and more maintainable.

    The trick is to combine record validation and record field extraction in one operation. Of course, in the words of the famous witticism, now you have two problems: coming up with a regex to match an entire data record may not be easy (and robustly matching, e.g., a name, even if the nationality domain is well defined, can be quite tricky, so you often end up with a hack like  \S+ as a 'temporary' expedient), but once defined, the regex, properly factored, can be quite clear and fairly easy to maintain.

    The example below takes liberties with names, those tricky devils, and otherwise assumes much about the OPed dataset, but shows the basic idea.

    c:\@Work\Perl>perl -wMstrict -le "my $record = do { my $id = qr{ \d+ }xms; my $name = qr{ [[:upper:]] [[:lower:]]+ }xms; my $first = $name; my $last = $name; my $age = qr{ \d+ }xms; qr{ \A ID= ($id) \s+ First= ($first) \s+ Last= ($last) \s+ AGE= ($age) \z }xms; }; ;; for my $rec ('ID=1 First=John Last=Doe AGE=42', @ARGV) { my ($id, $first_name, $last_name, $age) = $rec =~ m{ $record }xms or die qq{malformed record: '$rec'}; print qq{id '$id' first '$first_name' last '$last_name' age '$ag +e'}; } " "ID=2 First=Joe Last=42 AGE=Doe" id '1' first 'John' last 'Doe' age '42' malformed record: 'ID=2 First=Joe Last=42 AGE=Doe' at -e line 1.
      Hi AnomalousMonk,

      Contrary to Laurent_R's aversion to using a single regex to extract data fields from a record ...

      I have no aversion whatsoever for regexes, I actually use them very often and I love them. ;-)

      I was only saying that, in that specific case, the use of the split function (which, BTW, uses explicitly a regex in the case in point) would IMHO lead to more concise and probably clearer code. Your suggested code definitely reaches the aims of clarity and ease of maintenance, but not the aim of concision.

      If the aim is concision, then the regex could be something like this (tested under the Perl debugger):

      DB<17> $line = "ID=1 First=John Last=Doe AGE=42"; DB<18> $word = qr/[a-zA-Z]+/; DB<19> ($id, $first, $last, $age) = $line =~ /^ID=(\d+)\s+First=($wo +rd)\s+Last=($word)\s+AGE=(\d+)\s*$/; DB<20> x ($id, $first, $last, $age) 0 1 1 'John' 2 'Doe' 3 42
      or even in one single line:
      my ($id, $first, $last, $age) = $line =~ /^ID=(\d+)\s+First=([a-zA-Z]+ +)\s+Last=([a-zA-Z]+)\s+AGE=(\d+)\s*$/;
      which is now quite concise, but arguably less clear and maintainable than the simple split I originally suggested. Admittedly, the above regex does a bit more data validation than the split version, but whether you actually need validation or not depends on the situation (essentially: where is the input data coming from?), sometimes you don't need (e.g. you produced the data yourself and you really know what it looks like), sometimes you do, but it can be difficult to figure out how extensive your validation process should be. May be the $word regex definition should be something like this:
      $word = qr/[A-Z][a-z]+/;
      or maybe simply:
      $word = qr/[a-z]+/i;
      Notice that this is opening an entirely different subject. Well, I'll leave it there, as this is getting slightly off-topic.