in reply to Re^2: Having an Issue updating hash of hashes
in thread Having an Issue updating hash of hashes

Contrary to Laurent_R's aversion to using a single regex to extract data fields from a record expressed herein, I find it's often both more robust and more maintainable.

The trick is to combine record validation and record field extraction in one operation. Of course, in the words of the famous witticism, now you have two problems: coming up with a regex to match an entire data record may not be easy (and robustly matching, e.g., a name, even if the nationality domain is well defined, can be quite tricky, so you often end up with a hack like  \S+ as a 'temporary' expedient), but once defined, the regex, properly factored, can be quite clear and fairly easy to maintain.

The example below takes liberties with names, those tricky devils, and otherwise assumes much about the OPed dataset, but shows the basic idea.

c:\@Work\Perl>perl -wMstrict -le "my $record = do { my $id = qr{ \d+ }xms; my $name = qr{ [[:upper:]] [[:lower:]]+ }xms; my $first = $name; my $last = $name; my $age = qr{ \d+ }xms; qr{ \A ID= ($id) \s+ First= ($first) \s+ Last= ($last) \s+ AGE= ($age) \z }xms; }; ;; for my $rec ('ID=1 First=John Last=Doe AGE=42', @ARGV) { my ($id, $first_name, $last_name, $age) = $rec =~ m{ $record }xms or die qq{malformed record: '$rec'}; print qq{id '$id' first '$first_name' last '$last_name' age '$ag +e'}; } " "ID=2 First=Joe Last=42 AGE=Doe" id '1' first 'John' last 'Doe' age '42' malformed record: 'ID=2 First=Joe Last=42 AGE=Doe' at -e line 1.

Replies are listed 'Best First'.
Re^4: Having an Issue updating hash of hashes
by Laurent_R (Canon) on Jul 06, 2014 at 09:21 UTC
    Hi AnomalousMonk,

    Contrary to Laurent_R's aversion to using a single regex to extract data fields from a record ...

    I have no aversion whatsoever for regexes, I actually use them very often and I love them. ;-)

    I was only saying that, in that specific case, the use of the split function (which, BTW, uses explicitly a regex in the case in point) would IMHO lead to more concise and probably clearer code. Your suggested code definitely reaches the aims of clarity and ease of maintenance, but not the aim of concision.

    If the aim is concision, then the regex could be something like this (tested under the Perl debugger):

    DB<17> $line = "ID=1 First=John Last=Doe AGE=42"; DB<18> $word = qr/[a-zA-Z]+/; DB<19> ($id, $first, $last, $age) = $line =~ /^ID=(\d+)\s+First=($wo +rd)\s+Last=($word)\s+AGE=(\d+)\s*$/; DB<20> x ($id, $first, $last, $age) 0 1 1 'John' 2 'Doe' 3 42
    or even in one single line:
    my ($id, $first, $last, $age) = $line =~ /^ID=(\d+)\s+First=([a-zA-Z]+ +)\s+Last=([a-zA-Z]+)\s+AGE=(\d+)\s*$/;
    which is now quite concise, but arguably less clear and maintainable than the simple split I originally suggested. Admittedly, the above regex does a bit more data validation than the split version, but whether you actually need validation or not depends on the situation (essentially: where is the input data coming from?), sometimes you don't need (e.g. you produced the data yourself and you really know what it looks like), sometimes you do, but it can be difficult to figure out how extensive your validation process should be. May be the $word regex definition should be something like this:
    $word = qr/[A-Z][a-z]+/;
    or maybe simply:
    $word = qr/[a-z]+/i;
    Notice that this is opening an entirely different subject. Well, I'll leave it there, as this is getting slightly off-topic.