Evanovich has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone-- Pardon my "newbieness"--this question seems like it should be pretty easily solved, but I can't seem to think about it right. I have a file that contains two columns: a name column and a column of numbers. I also have an array that contains a list of names. The array contains almost all of the names in the file, as well as many others that are not in the file. Here's what I mean exactly:
FILE ARRAY name1 4.5 name1 name3 3.4 name2 name5 6.5 name3 name4 -7.9 name4 name6 3.2 name5
And I want to end up with an array of numbers, sorted by the names in the array. And, where there is a null value, I want 999 entered. Thus, I want the resulting array to be something like this:
4.5 999 3.4 -7.9 999
I'm just having a bit of trouble doing the correct (and time-efficient) iteration. The columns and arrays are about 10,000 elements long. Thanks, Evan

Replies are listed 'Best First'.
Re: Comparing hashes and arrays
by blakem (Monsignor) on Sep 06, 2001 at 10:46 UTC
    Make a hash out of the file that you can use as a lookup table. Then loop through the array and print out the value if you have it, or the default value if you dont.
    #!/usr/bin/perl -wT use strict; my @names = map {"name".$_} (1..5); # sneaky way to generate your exam +ple array my %name2number; while(<DATA>) { # generate the hash from the file my ($key,$value) = (split); $name2number{$key} = $value; } for my $name (@names) { # loop through the array using the + lookup hash, # or a default value my $number = $name2number{$name} || 999; print "$name => $number\n"; } __DATA__ name1 4.5 name3 3.4 name5 6.5 name4 -7.9 name6 3.2 =OUTPUT name1 => 4.5 name2 => 999 name3 => 3.4 name4 => -7.9 name5 => 6.5

    -Blake

Re: Comparing hashes and arrays
by lestrrat (Deacon) on Sep 06, 2001 at 10:40 UTC

    I don't know about efficiency, but something like this comes to mind:

    my( %hash, @order ); # assume these are initialized # 1 - map my @numbers = map{ exists $hash{ $_ } ? $hash{ $_ } : 999 } @orde +r; # 2 - foreach my @numbers; foreach my $name ( @order ) { push @numbers, ( exists $hash{ $name } ? $hash{ $name } : + 999 ); }

    Something like that. Does that work?

(larryk) sticking with the schwartzian
by larryk (Friar) on Sep 06, 2001 at 13:35 UTC
    The Schwartzian Transform
    #!perl use strict; use warnings; use Data::Dumper; my @array = qw/name1 name2 name3 name4 name5/; my %file = (); $file{ (split)[0] } = (split)[1] while <DATA>; my @result = map { $_->[1] } sort { $a->[0] cmp $b->[0] } map { [ $_, exists $file{$_} ? $file{$_}+0 : 999 ] } @array; print Dumper( \@result ); # output: # #$VAR1 = [ # '4.5', # 999, # '3.4', # '-7.9', # '6.5' # ]; __DATA__ name1 4.5 name3 3.4 name5 6.5 name4 -7.9 name6 3.2
       larryk                                          
    perl -le "s,,reverse killer,e,y,rifle,lycra,,print"
Re: Comparing hashes and arrays
by Anarion (Hermit) on Sep 06, 2001 at 12:52 UTC
    If I understand you well this should work:

    map{/(\S+)\s+([\d.]*)/;$2?($a{$1}=$2):($a{$1}=999)}<FILE>; map{push(@result,$a{$_})}@order;


    Update $anarion=\$anarion;

    s==q^QBY_^=,$_^=$[x7,print

      That is probably a pretty good start on a golf answer. I do wonder however whether it was at all useful to a user who asks this level of question. Perhaps you could explain how it works to the inquiring monk, that would make it more valueable.

      Update I don't believe this returns the right result either. I am still checking out why, will update shortly.

      Update 2:
      The solution you presented fails to produce the requested 999 result in the case of the missing entry. It also fails to capture the negative value, -7.9.

      I will break down the provided solution so the questioner can perhaps understand what is going on better.

      map{ # map is a way of building a loop # map returns a list of the resulting values. # this use of map makes no use of the returned list # which is often considered bad form, however when # golfing it can be useful for shortening your code. /(\S+)\s+([\d.]*)/; # this is a regular expression # (\S+) says to grab 1 or more # non-white-space characters, # the result captured to $1 # \s+ says to grab 1 or more # white-space characters # ([\d.]*) says to grab either # digits (\d) or a period (.) # 0 or more times. # this will be captured to $2 $2 # This is a a ternary operator # A sometimes useful way of # writing an if-else statement. # This says "if $2" ? ( $a{$1} = $2 ) # then set $a{$1} to $2 : ( $a{$1} = 999 ) # else set $a{$1} to 999 }<FILE>; # the lines read from <FILE> will be used # as input to the map, as $_ map{push(@result,$a{$_})}@order; # again a map, taking the order array and pushing the # related values from the $a hash into @results # giving you the ordered numbers.
      This code has two errors, and one potential gotcha.
      • ([\d.]*) does not catch the negative test value. It could be rewritten as (-?\d.+). Of course this will also catch values like -75.45.23.35
      • The gotcha,  $2 ? ... will give the wrong result if the value associated with the name is 0 (zero). This can be fixed by using $2 ne '', as one possibility, instead.
      • map{push(@result,$a{$_})}@order; will not return the appropriate '999' responses, because the @order values were never set if the did not appear in the input file. We could fix this with map{push(@result,defined $a{$_} ? $a{$_} ? 999)}@order ( assuming the other fixes are in place.

      The resulting fixed code...

      map{/(\S+)\s+(-?[\d.]+)/;$a{$1}=$2?$2:''}<DATA>; map{push(@result,defined $a{$_} ? $a{$_} : 999)}@order;

      Perhaps we should call golf-on?

        $2 ? ... will give the wrong result [...] This can be fixed using $2 ne ''

        Well, it seems to me from the original post that the file will always have both columns; if the match doesn't succeed then the line is not valid and (maybe?) shouldn't be considered.

        The code becomes (note that map in void context is almost always considerably slower than for):

        # assuming @i = map "name$_" 1..5 /(\S+)\s+(-?\d+(?:.\d+)?)/and$a{$1}=$2 for<DATA>; push@o,exists$a{$_}?$a{$_}:999 for@i;

        Remember, this simply ignores lines in the file that don't have both columns. It also ignores a second decimal (and everything after) in the second column. It all depends on how lenient we want to be of bad data.

        bbfu
        Seasons don't fear The Reaper.
        Nor do the wind, the sun, and the rain.
        We can be like they are.

        Your code has now an error, that mine has not, if the second value is null.
        /(\S+)\s+(-?[\d.]+)/;
        If the second value is null the re doesn't match and $2 is the one of the match before.
        Update:
        Sorry i dont read that bbfu has fixed it adding ?

        $anarion=\$anarion;

        s==q^QBY_^=,$_^=$[x7,print