Refactor and simplify

delirium has asked for the wisdom of the Perl Monks concerning the following question:

I was recently handed a project to take over some in-production code who's owner left the company, document it, fix any "bad stuff", and have my group start supporting it. The code is 6 Perl scripts and a shell script that runs them. I want rewrite this as one Perl script and simplify everything, and I'm running into problems untangling what filename is being used for what at what time. There are lots of STDOUT and STDERR redirects from the shell script. For example:

FILE_LIST=`ls *.wrk`
for FILE in $FILE_LIST
do
    logit "COPYING $FILE TO $TGT_DIR"
    cp $FILE $TGT_DIR
    logit "PROCESSING $FILE FOR GROUP1"
    perl /some/other/script.pl <$FILE >>TMPOUT 2>>err.rpt
    rm -f $FILE
done
[download]

So something happens to the files in the CWD, the output is concatenated to TMPOUT, and later more processing is done on the files copied to $TGT_DIR, which are just the original .wrk files. Like so:

cd $TGT_DIR
FILE_LIST=`perl /rename/by/id.pl $FILE_NAME`
for FILE in $FILE_LIST
do
if perl /some/compliance/checks.pl $FILE  2>/dev/null >$FILE.ERR
then
    cp -f $FILE archive
    cp -f $FILE out
else
    logit "$FILE WAS MALFORMED:\n `cat $FILE.ERR`"
    mail -s "FILE $FILE INCOMPLETE, NOT SENT" $NOTIFY <<EOF
FILE $FILE NOT SENT BECAUSE:
`cat $FILE.ERR`
EOF
fi
[download]

I'm wondering if my time would be better spent creating a Perl script to replace the shell script, and import the Perl scripts as functions, or tracing everything that each script is doing, and build a new script that reduces everything to the essentials. For example, /some/other/script.pl and /some/compliance/checks.pl both do compliance checks for different groups in my company, and copy files if they pass to the appropriate directories. These could be combined into one script that compliance checks everything in one pass of the file, checks the IDs on the fly and does what /rename/by/id.pl would normally do.

I could spend a week or so trimming the 500 line monstrosity down to around 100 lines of just-the-essentials, or I could spend two days refactoring and documenting, and leave it looking like a giant, bloated mess. Each Perl script is also its own breed of hell, e.g.:

$FILENAME=shift;
($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$bl
+ksize,$blocks)=
stat($FILENAME);
($fsec,$fmin,$fhour,$fmday,$fmon,$fyear,$fwday,$fyday,$fisdst) = local
+time ($mtime);
$fyear=$fyear+1900;
$fmon++;
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =    localtime(t
+ime);  
$year=$year+1900;
$mon++;
open (INFILE,$FILENAME);
while (<INFILE>) {
    chomp;
    if (m/(^[0-9]{2})/) {
        $RecordCount++;
        $STATS{"$1"}++;
        @Record=split(",",$_);
        if ($Record[0] eq '01') {
            $CREATE_DATE="20$Record[3]";
            $DATESTRING=sprintf("%04d%02d%02d",$year,$mon,$mday);
            if (($DATESTRING-$CREATE_DATE)>2){
                $WARNINGS{"01 Date looks old"}++;
                
            };
        };
# etc.
[download]

Nice date delta function. Need I go on? I don't want code like that being my code and to be charged with supporting it. My question to the community is: What would you do? Have you ever been in a similar situation? How did it turn out?

Comment on Refactor and simplify Select or Download Code

Replies are listed 'Best First'.
Re: Refactor and simplify by idsfa (Vicar) on Oct 05, 2003 at 17:08 UTC
Escalate to your manager. Go back and tell him what you just told us, but phrase it as, "How do you want me to spend my time as a resource?" If, as you say, they intend for the code to be supported indefinitely, you can lean on the unsupportability of the current code as a reason for him to give you the time to do it right. Of course, then you have to deliver. Especially if you haven't before gotten to do a project, this is a great opportunity to show you are sensitive to the needs of the business as well a worthy of greater responsibility. (Gaack ... I hate it when I fall into businessspeak) Remember, when you stare long into the abyss, you could have been home eating ice cream.	[reply]
Re: Refactor and simplify by toma (Vicar) on Oct 05, 2003 at 17:53 UTC
I've had good results when I followed these steps: Bring the code into a revision control system. Build test data and a test sytem so that proper operation can be verified with an automated test. Create an independent test environment, do not change production code. Fix the worst parts of the code, using test data to make sure that the new system gets the same results as the old system. You will almost certainly find bugs and want to fix them. Use a boss-approved release process and give users warning that the system is changing. Get users to test the improvements before release. If you can't develop a realistic test system, you will have to work much harder and develop many test cases that verify that you aren't changing anything. With this increase in level of difficulty, it might not be justifiable to recode the system. It should work perfectly the first time! - toma	[reply]
Re: Refactor and simplify by BrowserUk (Patriarch) on Oct 05, 2003 at 20:52 UTC
Two stories: 900 lines of COBOL, structured like assembler, that had been APARd for correction 4 times with each 'correction' having spawned a new APAR in the field. After two weeks of trying to figure out where the error lie, I re-wrote the routine from scratch, reducing it to < 200 lines in the process. Regression testing showed the that it fixed the current APAR and hadn't re-instated any of the earlier ones. All test passed. Submitted for approval, the change was rejected as "too extensive" and I was instructioned to reduce the volume of the change, by reverting to the 900 lines and only changing those that required changing. When I explained that the original code was so complex and simply had too many branches and special cases for me to work out which combination of factors caused the original problem, the APAR was closed as "Un-reproducible"! A real-time system for capturing large volumes of data, occasionally, but regularly, would balk on a particular set of circumstances in an input stream and crash. Several people had spent time trying to understand why this particular set of inputs would cause the problem, but had failed to find the root cause. I suggested a fix which was to pre-filter the input and detect the circumstances in question and adjust the structure to an equivalent, but different structure, that was known not to cause the problem. This went into production while the original code was investigated further. The problem was eventually tracked down to the of depth of recursion required to process the structure of the input causing a stack frame (limited to 64k by the 386 segmented architecture) to overflow. The "right way" to fix the problem was to re-compile the library to use the huge model, but various assumptions in the original design meant that this entailed an almost complete re-write of the entire system to use the huge model. A very pragmatic decision (by an enlightened manager) was taken to generalise the pre-filter so that on the rare occasions when the structure of the input became so complex as to push the recursive routine into stack overflow, the structure was converted to two simpler, equivalent structures. This effectively doubled the headroom and prevented the problem from occurring. It was an easy patch, but saved a huge amount of re-development effort, time and crucially money. Oh that all management decisions where so pragmatic. Put your idea and reasons to your management. Preferably, grab some production input and output data and set up your testing to show that your refactoring replicates the production code in every way -- including replicating any bugs that it may have as a first pass. Once your refactoring can be used as a replacement for the status quo, it becomes much easier to tackle any bugs it has one by one and ensure that fixing them does not have undesirable knock-on effects on other parts of the system. Good luck, and may your management be pragmatic. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller If I understand your problem, I can solve it! Of course, the same can be said for you.	[reply]
Re: Refactor and simplify by delirium (Chaplain) on Oct 07, 2003 at 15:54 UTC
Thanks for the suggestions (and the stories). The versioning system made sense as a way to approach this particular problem. On my side was the fact that management and my fellow coders agreed that the code in question was ugly, rushed, and unmaintainable. (Of course, it isn't too hard to persuade a coder that someone else's code is horrible.) Our site does have separate development and test systems to fool with code before it goes live. I was able to procure some test data to run through the old scripts, index the results, and use them to make sure my modified produced identical results. I'm up to revision 6 now, with the main script rewritten as Perl, and only two shells left to convert to functions. Thanks for everyone's input.	[reply]