Public Scratchpad | Download, Select Code To D/L |
=========================
autostopping the debugger
Lets assume, you want to use the debugger. Then the problem might be to get the debugger to break after a warning condition happened.
The warnings contain a line number which is fine, but when the line is part of a loop, this is not enough information. We would want a stop with the current context. Only then can we examine the state of the program in the context that produced the problem.
So what to do?
I replace the signal handler for SIGWARN with my own handler that checks for the typical format of a Perl warning. I do that because I am interested only in warnings from the Perl interpreter. If this format has been detected, the code branches into a special path where we can setup the debugger. We want the debugger to stop and to return to the caller, where there warning was caused.
So we set a variable that causes the debugger to stop. This will take effect when the signal handler has returned.
After that setup stage, the warning message is printed as before the modification.
The signal handler code should go into the debugger initialization file .perldb. Then I do not have to modify the original source code.
This is the content of file .perldb (place it in the current or in the home directory):
Note: This is mostly stolen from my writeup Re: Debugging a program and Re^3: Debugging a program, but I think it fits better here.sub afterinit { $::SIG{'__WARN__'} = sub { my $warning = shift; if ( $warning =~ m{\s at \s \S+ \s line \s \d+ \. $}xms ) { $DB::single = 1; # debugger stops automatically after # the line that caused the warning. } warn $warning; }; print "sigwarn handler installed!\n"; return; }
The other structure needed is the longest common prefix array (LCP). It contains the maximal length of the prefix for this entry shared with the previous entry from the suffix array. It looks like this for this example.Offset Prefix ============= 10: i 7: ippi 4: issippi 1: ississippi 0: mississippi 9: pi 8: ppi 6: sippi 3: sissippi 5: ssippi 2: ssissippi
Sorted by LCP valueOffset LCP (prefix shown in ()) ================================ 10: 0 () 7: 1 (i) 4: 1 (i) 1: 4 (issi) overlapping, 3 (iss) non overlapping 0: 0 () 9: 0 () 8: 1 (p) 6: 0 () 3: 2 (si) 5: 1 (s) 2: 3 (ssi)
Offset LCP (prefix shown in ()) ================================ 1: 4 (issi) overlapping, 3 (iss) non overlapping 2: 3 (ssi) 3: 2 (si) 7: 1 (i) 4: 1 (i) 8: 1 (p) 5: 1 (s) 10: 0 () 0: 0 () 9: 0 () 6: 0 ()
Similarly care must also be taken to avoid matches are crossing files. By limiting the LCP values when the offsets plus LCP crosses file offsets this can be avoided.
Some smaller repetitions hide in the larger ones and can be ignored. For example
4: 1 (i) 3: 2 (si)
Some matches are exclusive like2: 3 (ssi)
A good heuristic would be to choose the candidate which would reduce the most copies.1: 3 (iss) non overlapping 2: 3 (ssi)
Using these structures one could find repetitions in the 'character stream' in O(n).
For analysing multiple files of Perl code, it is easiest to concatenate them in a long string. Then for reporting matches offsets from the suffix array should be retransformed back to files and line offsets. Also a report should be able to show more than two copies of the same original part.