Gyro has asked for the wisdom of the Perl Monks concerning the following question:

Greetings,
I am trying to load a hash as a code translation for a much larger file. In my efforts to simplify, actually I wanted to see if this can be done, I came up with the following code, some sample data(tab delimited) and the output
#!/usr/bin/perl -w use strict; use warnings; my %CpmRC; while (<DATA>) { %CpmRC = /^(\d+)\W\d+\W\d+\W(\d+)\W\w+/; } while (my ($key, $value) = each(%CpmRC)) { print "$key,$value\n"; # Print contents of the hash } __DATA__ 030 003 1234567 4403 comments 031 003 1234567 4404 comments 032 003 1234567 4405 comments OUTPUT: 032,4405
Out of curiosity why does this work...
/^(\d+)\W\d+\W\d+\W(\d+)\W\w+/; $CpmRC{$1}=$2; and this only catches the last two matches %CpmRC = /^(\d+)\W\d+\W\d+\W(\d+)\W\w+/;
I am using Perl 5.6.1 on both Solaris and NT. Any comments and/or improvements this will be appreciated.

Thanks,
Gyro

Replies are listed 'Best First'.
Re: Loading a Hash directly from a Regex
by merlyn (Sage) on Feb 14, 2002 at 15:59 UTC
    Assigning to a hash using %this_hash = ... always overwrites the entire hash.

    Additionally, the code of:

    /^(\d+)\W\d+\W\d+\W(\d+)\W\w+/; $CpmRC{$1}=$2;
    is dangerous, because when the match fails, you get the previous value of $1 and friends. Always do this in the conditional based on the match:
    if (/^(\d+)\W\d+\W\d+\W(\d+)\W\w+/) { $CpmRC{$1}=$2; }

    -- Randal L. Schwartz, Perl hacker

Re: Loading a Hash directly from a Regex
by japhy (Canon) on Feb 14, 2002 at 15:57 UTC
    It doesn't work because you are setting the entire hash each time. That's just what %foo = ... does. You could do it all at once via:
    %CpmRC = map /^(\d+)\W\d+\W\d+\W(\d+)/, <DATA>; # or... %CpmRC = map { (/\d+/g)[0,3] } <DATA>; # reads a bit nicer

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a (from-home) job
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: Loading a Hash directly from a Regex
by hakkr (Chaplain) on Feb 14, 2002 at 16:09 UTC
    minor point but you don't need -w as well as use warnings .
    /^(\d+)\W\d+\W\d+\W(\d+)\W\w+/;
    is operating on the $_ value and is equivalent to
    $_=~ /^(\d+)\W\d+\W\d+\W(\d+)\W\w+/;
    which is more readable
      $_=~ /^(\d+)\W\d+\W\d+\W(\d+)\W\w+/;
      which is more readable
      That's debateable. If someone puts =~ and $_, I am forced to wonder if something is going on that requires them to not merely accept the default. It's misleading. I don't see how that makes it "more readable".

      Regex matches against $_ are the common form. =~ is an exception. To use an exeception to still perform the common operation is a bit like using the emergency handbrake at every stop light.

      -- Randal L. Schwartz, Perl hacker

        But the complete novice or maintainer may not know the footbrake exists.
        It's fair point around here but for novice or unperlish programmer using a = sign in the statement indicates there is an assignment going on. One of the problems/benefits of Perl is the number of unique programming concepts it has and defaults you need to learn. In my opinion Perl code is not of a high standard unless it's unreadable to a non Perl programmer. Anyway it's an old argument, I think even $_ is unreadable but I realise it has it's benefits. Especially for for the experts:)