How do I dynamically populate a hash after parsing columns from a file

lomSpace has asked for the wisdom of the Perl Monks concerning the following question:

Hello,
I am parsing a file and extracting two columns to use as key value pairs in a
hash. After parsing the file and assigning the key value pairs my hash is empty when there should be 24750 key value pairs.
Any ideas on how I can get all of the key value pairs into the hash?

#!/usr/bin/perl -w
use strict;
use Data::Dumper;
#open file
open(IN,"CCDS.20090902.txt") or die " Can't open ccds file: $!";
# initialize the hash
my %geneids =();
#open the file and push the info from the designated columns into it
# remove header
my $firstline = <IN>;
chomp $firstline; 
while(<IN>){
    chomp; # remove the newline character
    my @fields = split (/\t/, $_);
    #extract the columns that we are interested in.
    my($key, $value) = ($fields[2], $fields[3]);
    # Populate the key value pairs of the hash with $gene and $id
    $geneids{$key} = $value;
    print "$key\t$value\n";
}
# We can also get the size of the hash
print "Hash size: ", scalar keys %geneids, "\n";
close();
__DATA__
#chromosome    nc_accession    gene    gene_id    ccds_id    ccds_stat
+us
1    NC_000001.8    NCRNA00115    79854    CCDS1.1    Withdrawn
1    NC_000001.10    SAMD11    148398    CCDS2.2    Public
1    NC_000001.10    NOC2L    26155    CCDS3.1    Public
1    NC_000001.10    PLEKHN1    84069    CCDS4.1    Public
1    NC_000001.10    HES4    57801    CCDS5.1    Public
1    NC_000001.10    ISG15    9636    CCDS6.1    Public
1    NC_000001.10    C1orf159    54991    CCDS7.2    Public
1    NC_000001.10    TTLL10    254173    CCDS8.1    Public
1    NC_000001.10    TNFRSF18    8784    CCDS9.1    Public
1    NC_000001.10    TNFRSF18    8784    CCDS10.1    Public
1    NC_000001.10    TNFRSF4    7293    CCDS11.1    Public
1    NC_000001.10    SDF4    51150    CCDS12.1    Public
1    NC_000001.10    B3GALT6    126792    CCDS13.1    Public
1    NC_000001.10    UBE2J2    118424    CCDS14.1    Public
[download]

LomSpace

Comment on How do I dynamically populate a hash after parsing columns from a file Download Code

Replies are listed 'Best First'.
Re: How do I dynamically populate a hash after parsing columns from a file by ikegami (Patriarch) on Mar 08, 2011 at 22:53 UTC
Your code works fine: `... Hash size: 13` [download] PS — `close();` should be `close(IN);`	[reply] [d/l] [select]
Re^2: How do I dynamically populate a hash after parsing columns from a file by lomSpace (Scribe) on Mar 09, 2011 at 05:39 UTC
Eclipse says that I my hash size is zero. I know that the file has 24750 records because it is on my desktop and and I opened it manually before I ran my script. I guess it's something with Perl implementation of eclipse. Thanks!	[reply]
Re^3: How do I dynamically populate a hash after parsing columns from a file by ikegami (Patriarch) on Mar 09, 2011 at 07:09 UTC
Eclipse is not implemented in Perl. I think you mean "the perl launched by EPIC". And I seriously doubt that. I suspect something you said isn't true. Once you find out what, you'll have found your problem.	[reply]
Re: How do I dynamically populate a hash after parsing columns from a file by lostjimmy (Chaplain) on Mar 08, 2011 at 23:00 UTC
There doesn't seem to be anything wrong with your code. I get `Hash size: 13` when I run it against the provided data. Also, you're using `Data::Dumper`, so why not see what that prints? Are you sure `CCDS.20090902.txt` actually contains anything?	[reply] [d/l] [select]
Re: How do I dynamically populate a hash after parsing columns from a file by roboticus (Chancellor) on Mar 09, 2011 at 13:19 UTC
IomSpace: I was able to reproduce your problem with a couple tweaks: I added `$/="\r\n";` just after your use statements, and I modified the data to end in "\n" only. So, I think you're running on a windows box, so perl is assuming CRLF line endings, but your data comes from a Unix box and has LF endings. So your line to consume the header eats the entire file, and your while loop has nothing to process. Anyway, that's my guess. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l]
Re: How do I dynamically populate a hash after parsing columns from a file by wfsp (Abbot) on Mar 09, 2011 at 10:34 UTC
You provided a short script with some sample data and monks are reporting that your script worked ok for them. I downloaded the code too but had to make a few small changes (the other monks must have done something similar). I commented out the `open(IN,"CCDS.20090902.txt")...` [download] line and changed the two subsequent occurences of `IN` to `DATA` (i.e. use the sample data). Worked ok for me too. Your algorythm's fine, this is useful to know and points you further up the script. The question becomes, "Is the input what you think it is?" Is the file your opening actually the file you think it is. For me at least, a common and frustrating state of affairs. :-) Perhaps consider what I do in these cases. I put in some debugging code that prints the first 10 lines of the file after its been opened. Add loud delimiters to the start and end of each line, say, `***`. This can help identify any issues with white space/new lines etc. Have a good look. Is it, as lostjimmy suggests, empty? Good luck!	[reply] [d/l] [select]