travisbickle34 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Basically I have written a script to read in a file full of entries, and call either a present or absent call based on the constituents of each entry. If a particular entry is EVER called present, it is subsequently always output as present. To store the data I was using a DBM::Deep hash, with each key paired to a 2 element anonymous array (one element for the present count, and one for the absent count). The script works fine with small numbers of entries. However with large numbers of entries (approx 50,000), after the counts reach around 3 or 4, they begin to reset to 1. I tried an alternative version in which I use 2 separate hashes (1 for present counts and 1 for absent counts) but it's too slow to use. Can anyone suggest why the original version's counts reset or an alternative way I could implement the code?

This is what it looks like:

#!/usr/bin/perl use DBM::Deep; use Getopt::Std; # Check for pre-existing output files and die if they exist if (-e "present"|| -e "absent") { die "Remove existing ouput files before running script!"; } # Define command syntax for output to screen in case of user error my $syntax = "\nCommand Syntax: \n\nihcrdb -i <input filename> -b <bac +kground>\n\n"; # Define hash for storage of command line arguments and define single- +letter switches to accept my %arghash = (); getopts("i:b:", \%arghash); # If all necessary arguments are not defined on command line, die with + error message and syntax output to screen unless (defined ($arghash{i} && $arghash{b})) { die "Insufficient commmand line arguments supplied! Quitting...\n +$syntax"; } # Define input file, output file and blast database = assign relevant +arghash values to them (my $input, my $background) = ($arghash{i}, $arghash{b}); # Define scalar variable to hold ref to Deep DB my $db = new DBM::Deep "CRDB"; # Get hash from DB my %pahash = %{$db->{hash}}; # Open input file or die open (INPUT, $input) or die "Cannot open infile!$!"; # Enter while loop for file parse while (<INPUT>) { # Skip header and Affy control lines next if (/^\s*$/) || (/^Gene/) || (/^AFFX/) || (/^2000/); # Split line on tabs, assign to array and chomp chomp (my @linearray = split "\t", $_); # Extract 3 required values my $name = shift @linearray; my $signal = shift @linearray; my $affycall = shift @linearray; # Increment Present count fot sequence if above bg and present else in +crement absent count if ($affycall eq "P" && $signal>$background) { $pahash{$name}->[0]++; } else { $pahash{$name}->[1]++; } } # Open present and absent output files open (PRESENT, ">present"); open (ABSENT, ">absent"); # Print sequence name and number of calls to output files. Output as p +resent if EVER called present foreach my $key (sort keys %pahash) { if (defined $pahash{$key}->[0]) { print PRESENT "$key\t$pahash{$key}->[0] present calls"; if (defined $pahash{$key}->[1]) { print PRESENT "\t$pahash{$key}->[1] absent calls\n"; } else {print PRESENT "\n";} } else { print ABSENT "$key\t($pahash{$key}->[1] absent calls)\n"; } } # Reassociate updated hash with stored DB $db->{hash} = \%pahash; # Close all files and exit close INFILE; close PRESENT; close ABSENT; exit;

Replies are listed 'Best First'.
Re: DBM::Deep Problems/Alternatives
by Anonymous Monk on May 19, 2005 at 11:26 UTC
    As my wisdom is not deep enough to answer your problem you could use Data::Dumper (print Dumper($yourValues);) to check every single value you set or store in your arrays and hashes. Maybe it is something else than you would think... good luck.
      Thanks, After much tearing out of hair, I gave the MLDBM module a try instead. The code now appears to be functioning perfectly. It must have been a DBM::Deep bug after all. :-)