lomSpace has asked for the wisdom of the Perl Monks concerning the following question:

Hello,
I am parsing a file and extracting two columns to use as key value pairs in a
hash. After parsing the file and assigning the key value pairs my hash is empty when there should be 24750 key value pairs.
Any ideas on how I can get all of the key value pairs into the hash?

#!/usr/bin/perl -w use strict; use Data::Dumper; #open file open(IN,"CCDS.20090902.txt") or die " Can't open ccds file: $!"; # initialize the hash my %geneids =(); #open the file and push the info from the designated columns into it # remove header my $firstline = <IN>; chomp $firstline; while(<IN>){ chomp; # remove the newline character my @fields = split (/\t/, $_); #extract the columns that we are interested in. my($key, $value) = ($fields[2], $fields[3]); # Populate the key value pairs of the hash with $gene and $id $geneids{$key} = $value; print "$key\t$value\n"; } # We can also get the size of the hash print "Hash size: ", scalar keys %geneids, "\n"; close(); __DATA__ #chromosome nc_accession gene gene_id ccds_id ccds_stat +us 1 NC_000001.8 NCRNA00115 79854 CCDS1.1 Withdrawn 1 NC_000001.10 SAMD11 148398 CCDS2.2 Public 1 NC_000001.10 NOC2L 26155 CCDS3.1 Public 1 NC_000001.10 PLEKHN1 84069 CCDS4.1 Public 1 NC_000001.10 HES4 57801 CCDS5.1 Public 1 NC_000001.10 ISG15 9636 CCDS6.1 Public 1 NC_000001.10 C1orf159 54991 CCDS7.2 Public 1 NC_000001.10 TTLL10 254173 CCDS8.1 Public 1 NC_000001.10 TNFRSF18 8784 CCDS9.1 Public 1 NC_000001.10 TNFRSF18 8784 CCDS10.1 Public 1 NC_000001.10 TNFRSF4 7293 CCDS11.1 Public 1 NC_000001.10 SDF4 51150 CCDS12.1 Public 1 NC_000001.10 B3GALT6 126792 CCDS13.1 Public 1 NC_000001.10 UBE2J2 118424 CCDS14.1 Public


LomSpace
  • Comment on How do I dynamically populate a hash after parsing columns from a file
  • Download Code

Replies are listed 'Best First'.
Re: How do I dynamically populate a hash after parsing columns from a file
by ikegami (Patriarch) on Mar 08, 2011 at 22:53 UTC
    Your code works fine:
    ... Hash size: 13

    PS — close(); should be close(IN);

      Eclipse says that I my hash size is zero. I know that the file has 24750 records
      because it is on my desktop and and I opened it manually before I ran
      my script. I guess it's something with Perl implementation of eclipse.
      Thanks!
        Eclipse is not implemented in Perl. I think you mean "the perl launched by EPIC". And I seriously doubt that. I suspect something you said isn't true. Once you find out what, you'll have found your problem.
Re: How do I dynamically populate a hash after parsing columns from a file
by lostjimmy (Chaplain) on Mar 08, 2011 at 23:00 UTC
    There doesn't seem to be anything wrong with your code. I get Hash size: 13 when I run it against the provided data. Also, you're using Data::Dumper, so why not see what that prints?

    Are you sure CCDS.20090902.txt actually contains anything?

Re: How do I dynamically populate a hash after parsing columns from a file
by roboticus (Chancellor) on Mar 09, 2011 at 13:19 UTC

    IomSpace:

    I was able to reproduce your problem with a couple tweaks:

    • I added $/="\r\n"; just after your use statements, and
    • I modified the data to end in "\n" only.

    So, I think you're running on a windows box, so perl is assuming CRLF line endings, but your data comes from a Unix box and has LF endings. So your line to consume the header eats the entire file, and your while loop has nothing to process. Anyway, that's my guess.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: How do I dynamically populate a hash after parsing columns from a file
by wfsp (Abbot) on Mar 09, 2011 at 10:34 UTC
    You provided a short script with some sample data and monks are reporting that your script worked ok for them. I downloaded the code too but had to make a few small changes (the other monks must have done something similar).

    I commented out the

    open(IN,"CCDS.20090902.txt")...
    line and changed the two subsequent occurences of IN to DATA (i.e. use the sample data). Worked ok for me too. Your algorythm's fine, this is useful to know and points you further up the script. The question becomes, "Is the input what you think it is?" Is the file your opening actually the file you think it is. For me at least, a common and frustrating state of affairs. :-)

    Perhaps consider what I do in these cases. I put in some debugging code that prints the first 10 lines of the file after its been opened. Add loud delimiters to the start and end of each line, say, ***. This can help identify any issues with white space/new lines etc. Have a good look. Is it, as lostjimmy suggests, empty?

    Good luck!