Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Allo,

I've been going crazy with a program I'm writing using the BioPerl library. Problem could be my code, or the public data, or the library. I want to rule out myself before looking at others. I do not want anyone to go learn the library to help with this, just to see if I am doing something syntax incorrect of some sort.

I have not yet found a "minimal test-case" because each run of program on just two chromosomes is 6 hours on my PC! I removed most "excess" already though.

# includes use strict; use Bio::ClusterIO; # locals my $dir = $ARGV[0]; my %data; # directory handling if (-d !$dir) { print "Directory does not exist!\n", "Program Terminating....\n"; exit(); } opendir(DIR, $dir) || die "$dir: $!\n"; chdir($dir); $dir =~ s/\\$//; while (my $infile = readdir(DIR)) { if ((-T $infile) && ($infile =~ /^ds_ch(.{1,2})\.xml$/i)) { my %data; my $chromosome = $1; print "Processing $chromosome\n"; my $parser = Bio::ClusterIO->new( -file => $infile, -format => 'dbSNP' ); while (my $record = $parser->next_cluster()) { if (my $class = $record->functional_class) { $class =~ s/^\s+//; $class =~ s/\s+$//; $data{$class}{$record->observed()}++; } } open(OUT, '>>', 'Results.txt'); foreach my $class (keys(%data)) { foreach my $snp (keys(%{$data{$class}})) { print OUT "$chromosome\t$class\t$snp\t$data{$class}{$s +np}\n"; } } close(OUT); } }

Problem is: even if directory of 20 files is processed, file only contains output from one file, or sometimes none? Tried subsets of data-files (each 1GB) but no differences so far. Any help please!

Replies are listed 'Best First'.
Re: Sanity Check: Debugging Help Needed!
by VSarkiss (Monsignor) on Aug 27, 2003 at 01:03 UTC

    if (-d !$dir) {doesn't make much sense. You're asking if the negation of a string is a directory. You want

    if (! -d $dir) {
    or use unless as suggested above.

      Thank-you, unfortunately that did not fix the other problem, but I have corrected this error.

Re: Sanity Check: Debugging Help Needed!
by CombatSquirrel (Hermit) on Aug 27, 2003 at 00:44 UTC
    Try substituting the following:
    • if (-d !$dir) { ... by unless (-d $dir) { ... -- you don't want the directory to be used as a binary value
    • while (my $infile = readdir(DIR)) { ... by for my $infile (readdir(DIR)) { ... -- array, not filehandle

    Those seem to be mistakes. See if it works for you afterwards.
    Hope this helped.
    CombatSquirrel.
    Entropy is the tendency of everything going to hell.

      Allo,

      Thank you for your response. I changed the directory-condition check. The readdir(DIR) change puzzles me somewhat. The results appear to unchange either way, and I tested this with simpler programs with identical results. When would I expect the while (readdir(DIR)) to be different from the other?

        You are right, it does not have any effect. Shame on me. (_ducks_) ;-)
        Cheers, CombatSquirrel.
        Entropy is the tendency of everything going to hell.
Re: Sanity Check: Debugging Help Needed!
by BrowserUk (Patriarch) on Aug 27, 2003 at 01:00 UTC

    About the only thing that looks suspect to my eyes is that you are not checking the return code from open or close.

    I can't think of situation in which an error in either would cause the open for append to blow away and overwrite the existing data, and this appears to be one possibility. Is the output you are seeing in Results.txt from the first file processed or the last?

    If it is the last, that would suggest that the '>>' mode is being ignored somehow, which seems unlikely.

    If the output is from the first file, that would tend to indicate that the subsequent attempts to re-open the file for append are failing for some reason. Which seems unlikely as you would be seeing lots of 'print() on unopened filehandle' messages--unless that use strict; migrated it way into the code just prior to posting :)

    Either way, try adding so or die ...$!; clauses to the open and close calls and see what if anything that reveals.

    As an aside, have you checked to see that you have enough disk space free? You could even add an or warn ...$!; clause to your print statements to check for disk full errors etc.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
    If I understand your problem, I can solve it! Of course, the same can be said for you.

      Hi BrowserUK,

      Thank you for your suggestions. It appears that it is only the first file that is getting appended to the text-file. This is true if the file exists before the program executes or not.

      I also added return code checking like so:

      if (!open(OUT, '>>', 'Results.txt')) { print "Open file failed!\n"; exit(); }

      Unfortunately this did not change the program output. I also have very large diskspace free (many GB). Strange too that the data gets populated into the hash, but doesn't want to write to disk. I will try incrementing filenames (e.g. open new file for each loop iteration). If that can work then the problem must be with the '>>' mode of the file-handle somehow.

        That is strange. Which version of perl are you using and what platform? If you posted the output from perl -V that would tell us the full story.

        Not that this ought to make a difference, if append mode was failing in some build or on some platform, I would have expected it to have been noticed already, but it might rings some bells with someone, somewhere and provide a clue that might help.

        With respect to return code checking. The usual idiom is

        open OUT, '>>', 'Results.txt' or die "Open for append to Results.txt failed. rc: $!";

        The two main differences here are that the error message will be output to STDERR rather than STDOUT which can be useful for seperating error output from 'normal' ouput via command line redirection. And the inclusion of the perl special variable $! which will contain an error message indicating why the operation failed. Good info to have when trying to solve such failures.

        There's nothing wrong with the way you're doing it, but I'd advise the incusion of $! to your output.

        None of that solves your problem though. I can't think of a situation where opening a file for append would succeed in opening it, but blow away the contents in the process. Maybe someone else has seen this?

        If it really in doing this, it might be worth trying the 2-arg form of open

        open OUT, '>> Results.txt' or die ...;

        If your build of perl really does containing a bug that is affecting append mode, there is a slight chance that using a slightly different code path might change something.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
        If I understand your problem, I can solve it! Of course, the same can be said for you.

Re: Sanity Check: Debugging Help Needed! (Solved: Thank you!)
by Anonymous Monk on Aug 31, 2003 at 01:46 UTC

    Thank you all for the help you gave. The problem was the obvious one: too little memory (RAM). That is why always first, or no files processed successfully. When I moved to a machine with 2 GB memory (from 1 GB) all worked well. How can I catch "out of memory" errors? No error was given to indicate that failure, which surprised me.