in reply to nmr comparison script

Like most other people, I was going to skip over your post because of the length of the code and the appalling layout. But then my phone line was disconnect for 4 hours which stopped me doing what I wanted to do, and your code sat on the screen in front on me so I played whilst waiting for the phone to come back.

A few steps to improving your chances of fixing your code, and to getting some help if you can't do it yourself.

  1. 700+ lines reduced to 450

    The first thing I noticed about your code was the appalling layout.

    Admittedly, about half of that was contributed by the PM codewrap 'feature'. I've had to break lines in this reply in stupid places to stop PM from complete screwing the formatting.

    I fixed this by using perl to parse and reformat the whole thing.

    P:\test>perl -MO=Deparse 376142-dl.pl >376142.pl 376142-dl.pl syntax OK

    Try it. You will be pleasently surprised. Consistant indentation makes reading code so much easier.

  2. 450 reduced to 250

    Next was that this block of code

    if ($line =~ /$atomid.*\s*\w+\s+$resid\s+$atomidmatch.* \w+\s+$residmatch\s+\d\ +./ or $line =~ /$atomidmatch.*\s*\w+\s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d +\./) { print OUTFILE "array2[$el]\n"; last; }

    which is repeated, exactly, 36 times. So, I moved that into a subroutine.

    sub outputIf { my( $line, $atomid, $atomidmatch, $residmatch, $resid, $array2_el, $outfile) = @_; if ($line =~ /$atomid.*\s*\w+\s+$resid\s+$atomidmatch.* \w+\s+$residmatch\ +s+\d\./ or $line =~ /$atomidmatch.*\s*\w+\s+$residmatch\s+$atomid.* \w+\s+$resid\ +s+\d\./ ) { print $outfile "array2_el\n"; return 1; } return 0; }

    and replaced each occurance in the main code with

    last if outputIf( $line, $atomid, $atomidmatch, $1, $2, $array2[$el], +\*OUTFILE );
  3. 250 reduces to 125

    That done, it becomes immediately apparent that the main bulk of the program now consists of six, nearly identical repetitions of

    if ($array2[$el + 10] =~ /resid (\d+) and name (H\w+\d*).?\)/ and $array2[$el ] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $resid = $1; $atomid = $2; if ($array2[$el + 10] =~ /resid (\d+) and name (H\w+\d*).\n?\)/) { $residmatch = $1; $atomidmatch = $2; } die "Can't open $distfil\ne" unless open DISTFILE, "$distfile"; while (defined($_ = <DISTFILE>)) { $line = $_; last if outputIf( $line, $atomid, $atomidmatch, $residmatch, $resid, $array2[$el] ); if ($array2[$el + 11] =~ /resid (\d+) and name(H\w+\d*).?\)/) +{ last if outputIf( $line, $atomid, $atomidmatch, $1, $2, $array2[$el] ); } elsif ($array2[$el + 12] =~ /resid (\d+) and name(H\w+\d*).?\) +/) { last if outputIf( $line, $atomid, $atomidmatch, $1, $2, $array2[$el] ); } elsif ($array2[$el + 13] =~ /resid (\d+) and name(H\w+\d*).?\) +/) { last if outputIf( $line, $atomid, $atomidmatch, $1, $2, $array2[$el] ); } elsif ($array2[$el + 14] =~ /resid (\d+) and name(H\w+\d*).?\) +/) { last if outputIf( $line, $atomid, $atomidmatch, $1, $2, $array2[$el] ); } elsif ($array2[$el + 15] =~ /resid (\d+) and name(H\w+\d*).?\) +/) { last if outputIf( $line, $atomid, $atomidmatch, $1, $2, $array2[$el] ); } } }

    So, by sticking that into a subroutine your main for loop reduces to

    @array2 = <KATFILE>; chomp @array2; foreach $el ( 0 .. @array2 ) { next if whileif( \@array2, $el, 10, $distfile, $outfile ); next if whileIf( \@array2, $el, 11, $distfile, $outfile ); next if whileIf( \@array2, $el, 12, $distfile, $outfile ); next if whileIf( \@array2, $el, 13, $distfile, $outfile ); next if whileIf( \@array2, $el, 14, $distfile, $outfile ); next if whileIf( \@array2, $el, 15, $distfile, $outfile ); print OUTFILE "$array2[$el]\n"; }
  4. Besides the stupid name, that subroutine is repetitious. A little juggling renders

    sub whileIf { my( $arrayRef, $el, $offset, $distfile, $outfile ) = @_; if ($array2[$el + $offset] =~ /resid (\d+) and name (H\w+\d*).?\)/ + and $array2[$el ] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $resid = $1; $atomid = $2; die "Can't open $distfil\ne" unless open DISTFILE, "$distfile" +; WHILELOOP: while( defined( $line = <DISTFILE> ) ) { for my $offset2 ( 0 .. 5 ) { last WHILELOOP if $array2[$el + $offset] =~ /resid (\d+) and name (H\w+\d*).\n?\)/ and outputIf( $line, $atomid, $atomidmatch, $1, $2, $array2[$el], $outfile ) + ; } } return 1; } return 0; }
  5. Down to under 100 lines.

    The modified program

    $katfile = '/remote/belgarath/tpukala/pegasus/xplor/signif/25jun2.kat2 +'; $outfile = 'temp.kat'; die "Can't open $outfile" unless open OUTFILE, ">$outfile"; die "Can't open $katfile" unless open KATFILE, "$katfile"; @array = <KATFILE>; chomp @array; $numelements = @array; @arraylist = 0 .. $numelements; foreach $el (@arraylist) { if ($array[$el] =~ /^\)$/ and $array[$el + 1] =~ /^\($/) { print OUTFILE ")\n*\n*\n*\n*\n*\n*\n*\n"; } elsif ($array[$el] =~ /assign/ and $array[$el + 1] =~ /^\($/) { print OUTFILE "assign\n*\n*\n*\n*\n*\n*\n*\n*\n*\n*\n"; } else { print OUTFILE "$array[$el]\n"; } } close KATFILE; close OUTFILE; $distfile = "/remote/belgarath/tpukala/pegasus/xplor/signif/25jun/rnk_ +2\n5jun_signif_.dst"; $katfile = '/remote/belgarath/tpukala/pegasus/xplor/signif/temp.kat'; $outfile = 'temp2.kat'; die "Can't open $outfile" unless open OUTFILE, ">$outfile"; die "Can't open $katfile" unless open KATFILE, "$katfile"; @array2 = <KATFILE>; chomp @array2; foreach $el ( 0 .. @array2 ) { whileif( \@array2, $el, $_, $distfile, \*OUTFILE ) and next for 10 + .. 15; print OUTFILE "$array2[$el]\n"; } close KATFILE; close OUTFILE; $katfile = '/remote/belgarath/tpukala/pegasus/xplor/signif/temp2.kat'; $outfile = 'half.kat'; die "Can't open $outfile" unless open OUTFILE, ">$outfile"; die "Can't open $katfile" unless open KATFILE, "$katfile"; @array3 = <KATFILE>; chomp @array3; $numelements3 = @array3; @arraylist3 = 0 .. $numelements3; foreach $el (@arraylist3) { if ($array3[$el] =~ /^\*$/) { (); } elsif ($array3[$el] eq $array3[$el + 1]) { (); } else { print OUTFILE "$array3[$el]\n"; } } ## Exit main program exit; sub outputIf { my( $line, $atomid, $atomidmatch, $residmatch, $resid, $array2_el, $outfile ) = @_; if ($line =~ /$atomid.*\s*\w+\s+$resid\s+$atomidmatch.* \w+\s+$residmatch\s ++\d\./ or $line =~ /$atomidmatch.*\s*\w+\s+$residmatch\s+$atomid.* \w+\s+$resid\ +s+\d\./ ) { print $outfile "array2_el\n"; return 1; } return 0; } sub whileIf { my( $arrayRef, $el, $offset, $distfile, $outfile ) = @_; if ($array2[$el + $offset] =~ /resid (\d+) and name (H\w+\d*).?\)/ + and $array2[$el ] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $resid = $1; $atomid = $2; die "Can't open $distfil\ne" unless open DISTFILE, "$distfile" +; WHILELOOP: while( defined( $line = <DISTFILE> ) ) { for my $offset2 ( 0 .. 5 ) { last WHILELOOP if $array2[$el + $offset] =~ /resid (\d+) and name (H\w+\d*).\n?\)/ and outputIf( $line, $atomid, $atomidmatch, $1, $2, $array2[$el], $outfile ) + ; } } return 1; } return 0; }

    Still far from perfect. It almost certainly doesn't fix your problem--and given the absence of my ability to test it without the appropraite data, it may well not even do what your original does--but had you posted this code, you would probably have got a lot more interest from people in trying to help you with your problem. Whenever you find yourself copy&pasting code, do so once--into a subroutine.

    Note: I've stuck with teh stupid subroutine names, because I don;t know enough about what your program is doing to choose good ones.

Give me a break! By the way, until about a week ago i had never heard of perl and had no idea whatsoever about computer programming so please be nice!!

Programming is hard. If I walked in to your lab and told you that I had lifted random bits of technique from these papers: alopecia, Mutations cause extra limbs in amphibians, in my attempt to create a 4-legged bald chicken for the food industry, but it didn't work and asked for your help.

What would your reaction be?

Whomever expects you to pick up programming in one week and do something useful with it, whether that is your boss, your lecturer, lab mentor or yourself, is doing you no favours at all. Most programmer spend years learning their craft, trying do it in a week is just insulting.

I'm not trying to discourage you, but get some training.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon

Replies are listed 'Best First'.
Re^2: nmr comparison script
by graff (Chancellor) on Jul 22, 2004 at 03:03 UTC
    That is truly a beautiful and impressive demonstration of code analysis and reduction. It reminds me of a very similar exercise I had to do in C about 15 years ago, taking someone else's War and Peace-sized copy/paste extravaganza and boiling it down by creating subroutines with appropriate parameters (so it could compile and run on MS-DOS, 15 years ago). I was appalled that the author of the original monster had presumably majored in Computer Science. I was only several weeks into learning C at the time, but had already spent a few years with FORTRAN, so successful code analysis was very much a matter of having practical experience.

    Alas, in the current situation, the sad thing about NMRsucks's first opus -- whether reduced or not -- is the complete absence of a sensible algorithm, and the apparent lack of an appropriate spec for what the process is supposed to do.

    Your repair work is inspiring and instructive, and I applaud (and ++) that, but the program itself is still basically unworkable, and the OP will be better off starting over from scratch, after he figures out how to explain the goal properly.

      Thanks. 4 hours of disconnected boredom :)

      I completely agree with your assessment of the underlying lack of algorithm, which is basically why I took it no further (that and the phone engineer calling to tell me the line was back). It actually took much longer to write up (badly) than it did to do.

      I hope that it will serve the OP in three ways.

      1. You don't become a programmer in a week.
      2. Copy & Paste is not the answer.
      3. Small is beautiful. If you can see what your doing, often as not see, you can what your doing wrong.

      ++ to opus!


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon