barakuda has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I am trying to read in a CSV file into array of hashes. Where the array indexes would represent lines in the file, while hash keys would be the respective names of the columns. So, here's my subroutine:
sub read_dat { my @data = (); open (IN, "<Weekly_Data.csv") or die "ERROR: Could not open weekly + data file!"; foreach my $line (<IN>){ my @line = split (",", $line); my $i = 0; my %tmp; foreach my $col (@columns){ $tmp{$col} = $line[$i]; $i++} push (@data, %tmp); } close (IN); return @data; }
But when I try to print something like print $data[0]{COMPONENT} Perl complains that "string "Component " cannot be used as a HASH reference". I think the problem is in how I populate the array. Any ideas? Thank you.

Replies are listed 'Best First'.
Re: Array of Hashes population
by Anonymous Monk on Mar 06, 2008 at 16:56 UTC
    You probably want to add a hash reference (a scalar) to the end of your array,
    push @data, \%tmp;


    You might prefer to return a array reference for speed.
      Wow! That's beautiful :) Thank you! So, you can't actually have an array of hashes, only an array of hash references?

        Correct; elements of Perl arrays and hashes must be scalars. One "converts" a variable (@array, %hash) into a reference to a variable by prefixing a backslash(\), so the reference to @array would be \@array. I believe that references can also point to subs, which means that one could have an array which has elements that are, variously, references to hashes, references to arrays, references to subs, references to scalars, and actual scalars.


        emc

        Information about American English usage here and here.

        Floating point issues? Read this before posting: http://docs.sun.com/source/806-3568/ncg_goldberg.html

Re: Array of Hashes population
by dwm042 (Priest) on Mar 06, 2008 at 18:25 UTC
    I'm sure that by now you've gotten a number of good answers, but I couldn't resist working with this one. What happens with the statement:

    push @data, %tmp;
    Is that the hash is flattened into a list, and the hash structure is lost. Wrapping %tmp in curly brackets i.e. '{' and '}' turns %tmp into an anonymous hash and that preserves the hash structure.

    push @data, { %tmp }; and later .. print $data[0]->{COMPONENT};
    Code where you can see all this (play with the push statement) is given below:
    #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my $file = "<DATA>"; my @columns = ( "ONE", "TWO", "THREE", "FOUR", "FIVE" ); my @AoH = read_dat($file); print Data::Dumper->Dump(\@AoH); print "\n\nArray of hash element: ", $AoH[0]->{"THREE"}, "\n"; sub read_dat { my $file = shift; my @data = (); for my $line( <DATA> ) { next unless $line =~ m/\w+/; my @line = split (",", $line); my $i = 0; my %tmp; foreach my $col (@columns){ $tmp{$col} = $line[$i]; $i++; } push @data, { %tmp }; } return @data; } __DATA__ 0,1,2,3,4 1,2,3,4,5 2,3,4,5,6 3,4,5,6,7


    Update: typo fixes

      That was an excellent explanation of the problem.

      I thought that it was worth commenting on the two suggested methods of storing the hash ref in the array, because there is a subtle difference between them.

      # Method 1 push @data, {%tmp}; # Method 2 push @data, \%tmp;

      Method 1 copies the values in %tmp into a new anonymous hash. The reference to the anonymous hash is stored in @data.

      Method 2 stores a reference to the %tmp hash.

      Either of these approaches could be desirable or lead to interesting bugs, depending on the situation. Other times, it may make almost no difference. In this case, I'd probably use method 2 to avoid an unnecessary copy operation, but I see no strong argument for either approach in this case--it really comes down to personal preference.

      The OP might find perldsc and perlreftut to be interesting reading.

      BTW, I made a couple of other changes. I used map to generate the hash. It's a bit more compact than the for loop. Another change is using a while loop to read the file. When you use a for loop, you will read the entire file into memory before processing begins. With a while loop, only one line is loaded at a time. Also, note the test in the while loop. The 'defined' is needed to filter out lines evaluate to false in a boolean context--for example: "0\n". I also added a chomp so that the last field doesn't end in "\n" all the time--that gets annoying.

      sub read_dat { my $file = shift; my @data = (); while ( defined (my $line = <DATA>) ) { next unless $line =~ m/\w+/; chomp $line; my @line = split (",", $line); my %tmp = map { $columns[$_] => $line[$_] } 0..$#columns; push @data, \%tmp; } return @data; }

      Warning: I haven't tested the above code. It may harbor typos and silly logic errors.


      TGI says moo

Re: Array of Hashes population
by Roy Johnson (Monsignor) on Mar 06, 2008 at 17:10 UTC
    A slice assignment would spare you walking through the columns and maintaining a counter.
    #!perl use strict; use warnings; my @columns = qw(col1 col2 col3); my @data = (); while (<DATA>) { chomp; my %tmp; @tmp{@columns} = split ','; push @data, \%tmp; } use Data::Dumper; print Dumper \@data; __DATA__ one,two,three four,five,six

    Caution: Contents may have been coded under pressure.
Re: Array of Hashes population
by EvanCarroll (Chaplain) on Mar 06, 2008 at 17:32 UTC
    You're doing this the difficult way. Use DBD::CSV, at the least that makes all of the methods in the DBI available to you.
    $hash_ref = $dbh->selectall_hashref($statement, $key_field); $hash_ref = $dbh->selectall_hashref($statement, $key_field, \%attr); $hash_ref = $dbh->selectall_hashref($statement, $key_field, \%attr, +@bind_values);
    The $key_field parameter defines which column, or columns, are used as keys in the returned hash. It can either be the name of a single field, or a reference to an array containing multiple field names. Using multiple names yields a tree of nested hashes.
    Of course, with this method you have to know how to write a select statement... update
    If you're looking for an AoH, just my @rows; push @rows, $_ while $sth->fetchrow_hashref


    Evan Carroll
    I hack for the ladies.
    www.EvanCarroll.com