Re: Array of Hashes population

I'm sure that by now you've gotten a number of good answers, but I couldn't resist working with this one. What happens with the statement:

push @data, %tmp;
[download]

Is that the hash is flattened into a list, and the hash structure is lost. Wrapping %tmp in curly brackets i.e. '{' and '}' turns %tmp into an anonymous hash and that preserves the hash structure.

push @data, { %tmp };

and later ..

print $data[0]->{COMPONENT};
[download]

Code where you can see all this (play with the push statement) is given below:

#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;

my $file = "<DATA>";
my @columns = ( "ONE", "TWO", "THREE", "FOUR", "FIVE" );

my @AoH = read_dat($file);
print Data::Dumper->Dump(\@AoH);

print "\n\nArray of hash element: ", $AoH[0]->{"THREE"}, "\n";


sub read_dat
{
    my $file = shift;
    my @data = ();
    
    for my $line( <DATA> ) {
        next unless $line =~ m/\w+/;
        my @line = split (",", $line);
        my $i = 0;
        my %tmp;
        
        foreach my $col (@columns){
            $tmp{$col} = $line[$i];
            $i++;
        }
        push @data, { %tmp };
    
    }
    return @data;
}
__DATA__
0,1,2,3,4
1,2,3,4,5
2,3,4,5,6
3,4,5,6,7
[download]

Update: typo fixes

Comment on Re: Array of Hashes population Select or Download Code

Replies are listed 'Best First'.
Re^2: Array of Hashes population by TGI (Parson) on Mar 06, 2008 at 19:56 UTC
That was an excellent explanation of the problem. I thought that it was worth commenting on the two suggested methods of storing the hash ref in the array, because there is a subtle difference between them. `# Method 1 push @data, {%tmp}; # Method 2 push @data, \%tmp;` [download] Method 1 copies the values in `%tmp` into a new anonymous hash. The reference to the anonymous hash is stored in `@data.` `Method 2 stores a reference to the %tmp hash.` `Either of these approaches could be desirable or lead to interesting bugs, depending on the situation. Other times, it may make almost no difference. In this case, I'd probably use method 2 to avoid an unnecessary copy operation, but I see no strong argument for either approach in this case--it really comes down to personal preference.` `The OP might find perldsc and perlreftut to be interesting reading.` BTW, I made a couple of other changes. I used map to generate the hash. It's a bit more compact than the for loop. Another change is using a while loop to read the file. When you use a for loop, you will read the entire file into memory before processing begins. With a while loop, only one line is loaded at a time. Also, note the test in the while loop. The 'defined' is needed to filter out lines evaluate to false in a boolean context--for example: "0\n". I also added a chomp so that the last field doesn't end in "\n" all the time--that gets annoying. sub read_dat { my $file = shift; my @data = (); while ( defined (my $line = <DATA>) ) { next unless $line =~ m/\w+/; chomp $line; my @line = split (",", $line); my %tmp = map { $columns[$_] => $line[$_] } 0..$#columns; push @data, \%tmp; } return @data; } [download] Warning: I haven't tested the above code. It may harbor typos and silly logic errors. TGI says moo	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Array of Hashes population
by TGI (Parson) on Mar 06, 2008 at 19:56 UTC

That was an excellent explanation of the problem.

I thought that it was worth commenting on the two suggested methods of storing the hash ref in the array, because there is a subtle difference between them.

# Method 1
push @data, {%tmp};

# Method 2
push @data, \%tmp;
[download]

Method 1 copies the values in %tmp into a new anonymous hash. The reference to the anonymous hash is stored in @data.

Method 2 stores a reference to the %tmp hash.

Either of these approaches could be desirable or lead to interesting bugs, depending on the situation. Other times, it may make almost no difference. In this case, I'd probably use method 2 to avoid an unnecessary copy operation, but I see no strong argument for either approach in this case--it really comes down to personal preference.

The OP might find perldsc and perlreftut to be interesting reading.

BTW, I made a couple of other changes. I used map to generate the hash. It's a bit more compact than the for loop. Another change is using a while loop to read the file. When you use a for loop, you will read the entire file into memory before processing begins. With a while loop, only one line is loaded at a time. Also, note the test in the while loop. The 'defined' is needed to filter out lines evaluate to false in a boolean context--for example: "0\n". I also added a chomp so that the last field doesn't end in "\n" all the time--that gets annoying. sub read_dat { my $file = shift; my @data = (); while ( defined (my $line = <DATA>) ) { next unless $line =~ m/\w+/; chomp $line; my @line = split (",", $line); my %tmp = map { $columns[$_] => $line[$_] } 0..$#columns; push @data, \%tmp; } return @data; } [download] Warning: I haven't tested the above code. It may harbor typos and silly logic errors. TGI says moo

[reply]
[d/l]
[select]