perlbrahmin has asked for the wisdom of the Perl Monks concerning the following question:

I do not have any special script to post here, nor any special output. This is a peculiar behavior by Perl, and I will talk about it in detail. I have an innocent print statement in a perl code, that is behaving badly. I am reading a plain text file line by line, adding some information to it, and printing the new line into output file. The total number of lines in unput files is 240303 and that in output file is 240063. Thus, a small fraction of lines were unexpectedly not present in output file. I found out which lines were missing, and asked the Perl script to take a pause when it has finished printing one of those missing lines. I could clearly see that the output file HAD that line, though improperly terminated. It didnt have the "\n", and some characters in the end of the file were missing. I did this many times, and the same characters were missing every time. But then, when I let the Perl script finish its job reading the whole input file and opened the output file, the line that was there before was missing, and its place (identified by its line number in the output file) was taken by the next line. This happened every time. This is not a very well-written script.. but still, writing down what i did, to give you an idea.
#!/usr/bin/perl use strict; use warnings; my $microsats = $ARGV[0]; my $orths = $ARGV[1]; open (MIC,"<$microsats") or die "Cannot open file $microsats: $!"; open (ORTHS,">$orths") or die "Cannot open file $orths: $!"; my %starthash=(); my %preused = (); my $startcord = 5; my $endcord = 7; # example of input line: # "NA1182988952620.b.scf dinucleotide AG 555 65 6000 + : 6002 a-g" while (my $line = <MIC>){ chomp $line; my @fields = split(/\t/,$line); push @{$starthash{$fields[$startcord]}} , $line; } while (my $line = <MIC>){ chomp $line; next if exists $preused{$line}; $preused{$line} = 1; my @fields = split(/\t/,$line); my @finalstatement = (); push @finalstatement, $line; my $searchstart = $fields[$startcord]-1; my $searchend = $fields[$endcord]+1; my $printer =0; $printer = 1 if $searchstart <= 1297 && 1297 <= $searchend; # turn +ing $printer on at a line that is having the stated problem for (my $i = $searchstart; $i<= $searchend; $i++){ if (exists $starthash{$i}){ my @orthologous = @{$starthash{$i}}; delete $starthash{$i}; foreach my $single (@orthologous){ next if exists $preused{$single}; $preused{$single} = 1; push @finalstatement, $single; my @sields = split("\t",$single); $searchend = $sields[$endcord] + 1 if ($sields[$endcor +d] + 1) > $searchend; $i = $sields[$startcord] - 2 if ($sields[$startcord] - + 1) < $searchstart; $searchstart = $sields[$startcord] - 1 if ($sields[$st +artcord] - 1) < $searchstart; } } } my $final = join("\t", @finalstatement)."\n"; print ORTHS $final; }

Replies are listed 'Best First'.
Re: Perl overwriting some lines
by jethro (Monsignor) on Nov 17, 2009 at 14:42 UTC

    Check the date on the output file. My guess is that that file was produced by a previous version of your script and then wasn't written to for a long time.

    Whether that is the case or not, you might try to enhance your bug searching a bit:

    Instead of putting in code that pauses at a specific point in time (what you tried with the $printer variable) and then checking the output file you should inspect your script at that time. Use the perl debugger and put a break point at a relevant line and with a condition. In your case you would do something like the following:

    perl -d <yourscript> <parameters>
    h gives you a short help screen, you can find out more with h <command>
    h
    l lists a few lines of your script, do until you find the line you want to stop at (or enter l 50 to list around line number 50)
    l l l
    lets assume line 32 is where you want to stop, but only if $searchstart is 1297
    b 32 $searchstart==1297 c
    the script now runs until it stops directly before executing line 32 when $searchstart has value 1297. If not, that line either was never executed or $searchstart never had the value 1297 at that line. Alternatively you could use "w $searchstart==1297" which would break the moment that condition is true

    now you can single step

    s s
    and print any variables
    p $startcord p @orthologous p $starthash{$i}

    You see, the perl debugger is really not that difficult to use

    The alternative to the debugger is to use temporary print statements in your code to check the variables, for simple tests it might be even faster, for serious bugfixing it is more work

      Thanks for the enlightenment I was not aware that Perl debugger can do all this. Will use it and report back if the problem persists
Re: Perl overwriting some lines
by JavaFan (Canon) on Nov 17, 2009 at 13:55 UTC
    I'm surprised you're getting any output. You're first while loop iterates over the entire file, populating %starthash. By the time the second while is encountered, there's nothing more to read, so its body is not executed once. But that's where the print statement is.
      Hi. I apologize for this. I removed the close(input file) and again, open (input file) commands by mistake. I do do that, and that's not the real problem. Again, my apologies. Will be more careful.
Re: Perl overwriting some lines
by graff (Chancellor) on Nov 17, 2009 at 19:07 UTC
    If you are expecting that your output file should always have the same number of lines as your input file, then there is something wrong with this part of your second while loop:
    while (my $line = <MIC>){ chomp $line; next if exists $preused{$line}; $preused{$line} = 1; ...
    If the input file happens to contain any duplicate lines, every non-initial occurrence of a duplicated string will be missing from the output.

    Things are added to the "%preused" hash at another point within that loop as well (although I don't really understand how that other part is supposed to work).

    You should be looking at the input file to figure out: (a) how many duplicate lines there are, and (b) how many lines might get eliminated due to the conditions that cause other things to be added to %preused.

Re: Perl overwriting some lines
by przemo (Scribe) on Nov 17, 2009 at 13:57 UTC

    ...and the question is?

    If you expect monks to help, you need to provide some input on which this doesn't work -- a minimal sample of it.