in reply to Re^3: csv to hash table
in thread csv to hash table

Update! got working code that reads csv file and creates hash table. Surprisingly, when i print and check hash size, script displays 13 and prints 13 keys and 13 values. I should have minimum 150 keys/values in hash table. Does anyone take a look and suggest any workarounds!!!

Column_1,Column_2 Name11,In Name21,Out Name31,In Name41,In
use Data::Dumper; my $infile = file.csv; sub mainCSV { open (my $infile1, '<', "$infile") or die "Unable to open $infile: + $!"; my %hash = (); while (my $line = <$infile1>) { chomp; $line =~ s/\s*\z//; my @array = split /,/, $line; my $key = shift @array; $hash{$key} = \@array; } my $size = scalar keys %hash; # explicit scalar context print "Hash size: $size\n"; # prints Hash size: xx print Dumper(\%hash); }

Replies are listed 'Best First'.
Re^5: csv to hash table
by Corion (Patriarch) on Nov 19, 2013 at 09:18 UTC

    Maybe you have duplicate keys?

    ... my $key= shift @array; if( $hash{ $key }) { warn "Duplicate key '$key'"; }; $hash{ $key }= \@array; ...
Re^5: csv to hash table
by locked_user sundialsvc4 (Abbot) on Nov 19, 2013 at 14:56 UTC

    We really need to see the output.   I see a potential problem here in that you are stashing “a reference to @array” ... while that is a single variable.   Hence, all of the references will be to the same block of storage, namely, @array.   In Fortran parlance, all of the references are EQUIVALENCEd.   They will all be seen to contain the last contents of @array, and a change to any one will be reflected in every other, because “you are actually looking at one block of storage, albeit through several mirrors.

    I think that you need to be sure that each hash-bucket contains a uniquehashref, and that the values get pushed onto that.   Perl’s “auto-vivification” feature comes in handy, with something like this:

    use strict; use warnings; use Data::Dumper; my @arry = ( "key", 1, 2, 3 ); my $hash; my $key = shift @arry; # NEXT STATEMENT 'AUTOMAGICALLY' CREATES A HASH-ENTRY FOR $key # AND CAUSES IT TO CONTAIN AN EMPTY ARRAY IF IT # DOES NOT ALREADY EXIST: push @{ $hash->{$key} }, $_ foreach @arry; print Data::Dumper->Dump([ \@arry, $hash], ["arry", "hash"] );
    gives ...
    $arry = [ 1, 2, 3 ]; $hash = { 'key' => [ 1, 2, 3 ] };

    The foreach clause is shorthand for an equivalent loop.   $_ contains the value within each iteration.   Notice how this loop is non-destructive to the content of @arry, iterating through its values without disturbing them.   The magic works now, because we are making copies of each value and pushing those onto a new arrayref (created on-demand) within the hash-bucket for $key.   Each hash-bucket, and each of the values therein, is distinct.

    In the statement-of-interest, @{ ... } is part but not all of the magic.   Here, we are telling Perl that the value within the hash-bucket should be interpreted as / initialized to an arrayref.   Perl will automatically create a hash-entry (of course) on demand, because that is what hashes do, but here we’re declaring its type and immediately using it.   We can “auto-vivify” hashrefs, too, so that a line of code something like this ... actually Just Works™:

    $hash->{"hickory"}{"dickory"}{"dock"} = "clock";
    gives...
    $hash = { 'hickory' => { 'dickory' => { 'dock' => 'clock' } } };
      while that is a single variable
      Not really, as the variable @array is declared with my inside the loop.
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Thanks for input. Basically hash will group keys with duplicate values. I replaced with new code by commenting-out warn message with duplicate keys. While in execution, there is error "Global symbol "$hash" requires explicit package name at ./file_name.pl line 69." I don't have hash_ref to hash, to my understanding, without defining hash_ref we can't refer to/and have data written in hash table. Please correct.

      sub mainCSV { # Open the CSV input file open (my $infile_CSV1, '<', "$infile_CSV") or die "Unable to open +$infile_CSV: $!\n"; my %hash = (); my $hash_ref = \%hash; while (my $line = <$infile_CSV1>) { chomp; $line =~ s/\s*\z//; my @array_CSV = split /,/, $line; my $key_CSV = shift @array_CSV; push @{ $hash->{$key_CSV} }, $_ foreach @array_CSV; ++++ Error + ++++ print Data::Dumper->Dump([ \@array_CSV, $hash], ["array_CSV", +"hash"] ); ++++ Error +++++ #if ($hash{$key_CSV}) #{ # warn "Duplicate key '$key_CSV'"; #}; #$hash{$key_CSV} = \@array_CSV; } # Explicit scalar context my $size = scalar keys %hash; # Open the output file and save hash in $outfile_RX_CSV #open (my $outfile2, '>', "$outfile_CSV") or die "Unable to open $ +outfile_CSV: $!\n"; #print $outfile2 Dumper(\%hash); #close $outfile2; #print "Stored $size list of pins in $outfile_CSV file.\n"; # Return a reference to the hash. #return %hash; }

        The syntax for accessing values in a hash via a hash variable is different from the syntax for access values in a hash via a reference to a hash variable. You are using the one where you should be using the other, thus you are getting an error.

        To access a value in a hash via a hash variable, you can use the following syntax:

        my $value = $hash{$key};

        To access a value in a hash via a reference to a hash variable, you can use the following syntax:

        my $value = $hash_ref->{$key};

        The -> in this second example dereferences the hash ref.

        You have a hash variable (%hash) and a reference to a hash variable ($hash_ref). You don't need the latter, but you can use it if you want to. In fact, you declare $hash_ref and set it to a reference to %hash, but you don't use it anywhere, so you can simply remove it. It makes no difference in the code you posted.

        The first line you marked with "++++ Error ++++" includes: $hash->{$key_CSV}. Because $hash is followed by ->, $hash must be a scalar containing a reference, but you don't have such a variable defined in the code you have posted.

        If you change that bit to $hash{$key_CSV}, then you will be accessing the value from %hash which you have declared and initialized.

        At the second line marked with "++++ Error ++++", you have $hash. This is a scalar variable but in the code you posted it is not declared or initialized. While $hash{$key} accesses a value in the hash %hash, $hash does not refer to %hash at all, it refers to the scalar with a similar name.

        The ability to use the same "name" to refer to different variables can be a confusing aspect of Perl. Remember that $name, @name and %name are three different variables: changes to one do not affect the others. It can be more confusing that $name{$key} accesses %name rather than $name, but that's the way it is.

        There are many ways you could change the subroutine you posted to eliminate the errors. Here are a couple of simple changes for you to consider:

        sub mainCSV { # Open the CSV input file open (my $infile_CSV1, '<', "$infile_CSV") or die "Unable to open $infile_CSV: $!\n"; my %hash = (); while (my $line = <$infile_CSV1>) { chomp; $line =~ s/\s*\z//; my @array_CSV = split /,/, $line; my $key_CSV = shift @array_CSV; push @{ $hash{$key_CSV} }, $_ foreach @array_CSV; print Data::Dumper->Dump([ \@array_CSV, \%hash], ["array_CSV", + "hash"] ); } # Explicit scalar context my $size = scalar keys %hash; }

        Or

        sub mainCSV { # Open the CSV input file open (my $infile_CSV1, '<', "$infile_CSV") or die "Unable to open $infile_CSV: $!\n"; my %hash = (); my $hash_ref = \%hash; while (my $line = <$infile_CSV1>) { chomp; $line =~ s/\s*\z//; my @array_CSV = split /,/, $line; my $key_CSV = shift @array_CSV; push @{ $hash_ref->{$key_CSV} }, $_ foreach @array_CSV; print Data::Dumper->Dump([ \@array_CSV, $hash_ref], ["array_CS +V", "hash_ref"] ); } # Explicit scalar context my $size = scalar keys %hash; }

        And if you really want to do it the second way, you can simplify that to:

        sub mainCSV { # Open the CSV input file open (my $infile_CSV1, '<', "$infile_CSV") or die "Unable to open $infile_CSV: $!\n"; my $hash_ref = {}; while (my $line = <$infile_CSV1>) { chomp; $line =~ s/\s*\z//; my @array_CSV = split /,/, $line; my $key_CSV = shift @array_CSV; push @{ $hash_ref->{$key_CSV} }, $_ foreach @array_CSV; print Data::Dumper->Dump([ \@array_CSV, $hash_ref], ["array_CS +V", "hash_ref"] ); } # Explicit scalar context my $size = scalar keys %$hash_ref; }