NMRsucks has asked for the wisdom of the Perl Monks concerning the following question:

clearly I'm not only bad at scripting but bad a posting! Give me a break! By the way, until about a week ago i had never heard of perl and had no idea whatsoever about computer programming so please be nice!! I am writing a script which will check the options for the following file, and if they are valid keep them if not delete them.
!# 23 ! 3 1 ! 9 10 37 ! 11 assign ( (resid 2 and name HB1) or (resid 2 and name HG) or (resid 7 and name HG11) ) ( (resid 2 and name HN) ) 0.0 1.8 3.00 ! 24 !# 25 ! 3 2 ! 9 10 37 ! 7 30 assign ( (resid 2 and name HB1) or (resid 2 and name HG) or (resid 7 and name HG11) ) ( (resid 2 and name HA) or (resid 6 and name HA) ) 0.0 1.8 4.07 ! 26
it works for the top example but not the bottom and I dont know why. ie we're comparing the combinations of f\2HA and 2HB1, 2HA and 2HG etc. The first example deletes the line with 7 in it as it should, while the second example deletes all the lines with twos even though they are valid. Here is the script
#!/usr/bin/perl $katfile = "/remote/belgarath/tpukala/pegasus/xplor/signif/25jun2.kat2 +" ; $outfile = "temp.kat"; open(OUTFILE, ">$outfile") or die "Can't open $outfile"; open(KATFILE, "$katfile") or die "Can't open $katfile"; @array = <KATFILE>; chomp @array; $numelements = @array; @arraylist=(0 .. $numelements); foreach $el (@arraylist) { if ($array[$el] =~ /^\)$/ && $array[$el+1] =~ /^\($/) { print(OUTFILE ")\n*\n*\n*\n*\n*\n*\n*\n") } elsif ($array[$el] =~ /assign/ && $array[$el+1] =~ /^\($/) { print(OUTFILE "assign\n*\n*\n*\n*\n*\n*\n*\n*\n*\n*\n" +) } else { print(OUTFILE "$array[$el]\n"); } } close(KATFILE); close(OUTFILE); $distfile = "/remote/belgarath/tpukala/pegasus/xplor/signif/25jun/rnk_ +2 5jun_signif_.dst"; $katfile = "/remote/belgarath/tpukala/pegasus/xplor/signif/temp.kat"; $outfile = "temp2.kat"; open(OUTFILE, ">$outfile") or die "Can't open $outfile"; open(KATFILE, "$katfile") or die "Can't open $katfile"; @array2 = <KATFILE>; chomp @array2; $numelements2 = @array2; @arraylist2=(0 .. $numelements2); #************************** #************************** foreach $el (@arraylist2) { #************************** if ($array2[$el+10] =~ /resid (\d+) and name (H\w+\d*).?\)/ && $array2[$el] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $resid = $1; $atomid = $2; if ($array2[$el+10] =~ /resid (\d+) and name (H\w+\d*) +. ?\)/) { $residmatch = $1; $atomidmatch = $2; } open(DISTFILE, "$distfile") or die "Can't open $distfi +l e"; while (<DISTFILE>) { $line=$_; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } elsif($array2[$el+11] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+12] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+13] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+14] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+15] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } }} } #************************* elsif ($array2[$el+11] =~ /resid (\d+) and name (H\w+\d*).?\)/ && $array2[$el] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $resid = $1; $atomid = $2; if ($array2[$el+11] =~ /resid (\d+) and name (H\w+\d*) +. ?\)/) { $residmatch = $1; $atomidmatch = $2; } open(DISTFILE, "$distfile") or die "Can't open $distfi +l e"; while (<DISTFILE>) { $line=$_; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } elsif($array2[$el+12] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+13] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+14] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+15] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+16] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } }} } #************************* elsif ($array2[$el+12] =~ /resid (\d+) and name (H\w+\d*).?\)/ && $array2[$el] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $resid = $1; $atomid = $2; if ($array2[$el+12] =~ /resid (\d+) and name (H\w+\d*) +. ?\)/) { $residmatch = $1; $atomidmatch = $2; } open(DISTFILE, "$distfile") or die "Can't open $distfi +l e"; while (<DISTFILE>) { $line=$_; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } elsif($array2[$el+13] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+14] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+15] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+16] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+17] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } }} } #************************* elsif ($array2[$el+13] =~ /resid (\d+) and name (H\w+\d*).?\)/ && $array2[$el] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $resid = $1; $atomid = $2; if ($array2[$el+13] =~ /resid (\d+) and name (H\w+\d*) +. ?\)/) { $residmatch = $1; $atomidmatch = $2; } open(DISTFILE, "$distfile") or die "Can't open $distfi +l e"; while (<DISTFILE>) { $line=$_; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w+\s+$residmatch\s+$atomid.* \w+\s+$r +e sid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } elsif($array2[$el+14] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w+\s+$residmatch\s+$atomid.* \w+\s+$r +e sid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+15] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w+\s+$residmatch\s+$atomid.* \w+\s+$r +e sid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+16] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+17] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+18] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } }} } #************************* elsif ($array2[$el+14] =~ /resid (\d+) and name (H\w+\d*).?\)/ && $array2[$el] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $resid = $1; $atomid = $2; if ($array2[$el+14] =~ /resid (\d+) and name (H\w+\d*) +. ?\)/) { $residmatch = $1; $atomidmatch = $2; } open(DISTFILE, "$distfile") or die "Can't open $distfi +l e"; while (<DISTFILE>) { $line=$_; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w+\s+$residmatch\s+$atomid.* \w+\s+$r +e sid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } elsif($array2[$el+15] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w+\s+$residmatch\s+$atomid.* \w+\s+$r +e sid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+16] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w+\s+$residmatch\s+$atomid.* \w+\s+$r +e sid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+17] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+18] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+19] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } }} } #************************* elsif ($array2[$el+15] =~ /resid (\d+) and name (H\w+\d*).?\)/ && $array2[$el] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $resid = $1; $atomid = $2; if ($array2[$el+15] =~ /resid (\d+) and name (H\w+\d*) +. ?\)/) { $residmatch = $1; $atomidmatch = $2; } open(DISTFILE, "$distfile") or die "Can't open $distfi +l e"; while (<DISTFILE>) { $line=$_; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w+\s+$residmatch\s+$atomid.* \w+\s+$r +e sid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } elsif($array2[$el+16] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w+\s+$residmatch\s+$atomid.* \w+\s+$r +e sid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+17] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w+\s+$residmatch\s+$atomid.* \w+\s+$r +e sid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+18] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+19] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } } elsif($array2[$el+20] =~ /resid (\d+) and name (H\w+\d*).?\)/) { $residmatch = $1; $atomidmatch = $2; if (($line =~ /$atomid.*\s*\w+\s+$resid\s+$ato +m idmatch.* \w+\s+$residmatch\s+\d\./) || ($line =~ /$atomidmatch.*\s*\w ++ \s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d\./)) { print(OUTFILE "$array2[$el]\n"); last; } }} } #************************* else { print(OUTFILE "$array2[$el]\n"); } } close(KATFILE); close(OUTFILE); #*************************** #*************************** $katfile = "/remote/belgarath/tpukala/pegasus/xplor/signif/temp2.kat"; $outfile = "half.kat"; open(OUTFILE, ">$outfile") or die "Can't open $outfile"; open(KATFILE, "$katfile") or die "Can't open $katfile"; @array3 = <KATFILE>; chomp @array3; $numelements3 = @array3; @arraylist3=(0 .. $numelements3); foreach $el (@arraylist3) { if($array3[$el] =~ /^\*$/) { } elsif($array3[$el] eq $array3[$el+1]) { } else { print(OUTFILE "$array3[$el]\n"); } } #end

Replies are listed 'Best First'.
Re: nmr comparison script
by graff (Chancellor) on Jul 21, 2004 at 05:29 UTC
    Whoa. There is so much about your post that is incomprehensible (and so much of it) that it's hard to figure out where to start. I'll try an unordered list...
    • It's not at all clear what is meant by "works" vs. "doesn't work" -- what are you trying to accomplish given the "two parts" of sample input data?
    • You appear to be using two input files, but you only provide sample data from one of them. (I think it's from "25jun2.kat2", not from "rnk_25jun_signif_.dst")
    • You appear to be reading the first input file into an array, then writing that array out to a temp file, adding lots of meaningless lines to the data ("*\n*\n..."), then you read that temp file back into a second array, process the array in this horrible "for" loop, write stuff to a second temp file, and finally read that back into a third array in order to print stuff to a final(?) "half.kat" file. (This all seems inconceivably pointless -- you should be able to accomplish your goal, whatever it is, in a single pass over the data.)
    • You seem to be opening and reading all of the "dist" input file on every iteration of the "for" loop over the lines from that first temp file. (Most people would read the second file into memory just once and store it in an array or something.)
    • You have copied/pasted some massive "if ... elsif ... else ..." block numerous times, with just minor differences between the copies, making the code at least 10 times longer and more complicated than it should be. (Most people figure out how to use a loop or subroutine to eliminate redundant code.) Update: looking closer, I see you have two nested layers of copied if/elsif/.../else blocks, causing a geometric expansion of redundant code (aarrgghh!)
    • You appear to be trying to "parse" the data by counting lines (offsets in the arrays), where most people would do true parsing, which means reading the data into a structure (array or hash containing arrays or hashes) that respects/represents the bracketing of information provided by the data.
    • Your posting of the code appears to have lots of spurious, errorful whitespace (inserted in the middle of variable names, etc.)

    There are some "stylistic issues" too, but no point going into that, given the issues above.

    So how about you start over with first principles: you've given a sample of one of the "original" input files, so how about you give (a) a sample of the other input, (b) the intended output, and (c) a brief summary of the criteria and relations that cause the output to be what it is. Think of that summary in terms of a synopsis or a minimal recipe -- that is, think about how you would document this process, how you would provide an overview of what the program is supposed to do. (My rule: write the docs/specs first, well enough that someone else can understand them, before writing any code.)

    Forget about the code you've posted -- it's entirely wrong and it should be deleted. Write the concise description of what your process must do in terms of the input, the output and the logic that relates these two. Then start writing the code from scratch to address that description (or start a new SoPW thread with a question about how to approach that task).

Re: nmr comparison script
by edan (Curate) on Jul 21, 2004 at 07:55 UTC

    You can't post your 737-line script and expect other people to read the whole thing, understand it, and debug it for you. Sorry, but life requires a little bit more effort on your part than that...

    I recommend that you isolate the smallest possible chunk of code that is causing the problem. This is a valuable debugging skill. If you are not able to figure out on your own why this bit of code is not working the way you want it to, then you can post this little bit of code along with your sample data, and tell us what you want this code to do, and what it does instead. Then maybe you'll get useful feedback, as well as learning how to be a better programmer.

    Oh, and you might want to read How do I post a question effectively?, too.

    --
    edan

Re: nmr comparison script
by FoxtrotUniform (Prior) on Jul 21, 2004 at 02:47 UTC

    It would be easier to help you if you gave us a bit more detail -- how does it fail for the second example?

    As for long scripts, you could either post it (between <code> tags, please) after a <readmore> tag, or put it in your public scratchpad -- that way even people who don't want to give you their email addresses can try to help.

    Edit: Much better, thanks.

    --
    F o x t r o t U n i f o r m
    Found a typo in this node? /msg me
    % man 3 strfry

Re: nmr comparison script
by BrowserUk (Patriarch) on Jul 21, 2004 at 15:53 UTC

    Like most other people, I was going to skip over your post because of the length of the code and the appalling layout. But then my phone line was disconnect for 4 hours which stopped me doing what I wanted to do, and your code sat on the screen in front on me so I played whilst waiting for the phone to come back.

    A few steps to improving your chances of fixing your code, and to getting some help if you can't do it yourself.

    1. 700+ lines reduced to 450

      The first thing I noticed about your code was the appalling layout.

      Admittedly, about half of that was contributed by the PM codewrap 'feature'. I've had to break lines in this reply in stupid places to stop PM from complete screwing the formatting.

      I fixed this by using perl to parse and reformat the whole thing.

      P:\test>perl -MO=Deparse 376142-dl.pl >376142.pl 376142-dl.pl syntax OK

      Try it. You will be pleasently surprised. Consistant indentation makes reading code so much easier.

    2. 450 reduced to 250

      Next was that this block of code

      if ($line =~ /$atomid.*\s*\w+\s+$resid\s+$atomidmatch.* \w+\s+$residmatch\s+\d\ +./ or $line =~ /$atomidmatch.*\s*\w+\s+$residmatch\s+$atomid.* \w+\s+$resid\s+\d +\./) { print OUTFILE "array2[$el]\n"; last; }

      which is repeated, exactly, 36 times. So, I moved that into a subroutine.

      sub outputIf { my( $line, $atomid, $atomidmatch, $residmatch, $resid, $array2_el, $outfile) = @_; if ($line =~ /$atomid.*\s*\w+\s+$resid\s+$atomidmatch.* \w+\s+$residmatch\ +s+\d\./ or $line =~ /$atomidmatch.*\s*\w+\s+$residmatch\s+$atomid.* \w+\s+$resid\ +s+\d\./ ) { print $outfile "array2_el\n"; return 1; } return 0; }

      and replaced each occurance in the main code with

      last if outputIf( $line, $atomid, $atomidmatch, $1, $2, $array2[$el], +\*OUTFILE );
    3. 250 reduces to 125

      That done, it becomes immediately apparent that the main bulk of the program now consists of six, nearly identical repetitions of

      So, by sticking that into a subroutine your main for loop reduces to

      @array2 = <KATFILE>; chomp @array2; foreach $el ( 0 .. @array2 ) { next if whileif( \@array2, $el, 10, $distfile, $outfile ); next if whileIf( \@array2, $el, 11, $distfile, $outfile ); next if whileIf( \@array2, $el, 12, $distfile, $outfile ); next if whileIf( \@array2, $el, 13, $distfile, $outfile ); next if whileIf( \@array2, $el, 14, $distfile, $outfile ); next if whileIf( \@array2, $el, 15, $distfile, $outfile ); print OUTFILE "$array2[$el]\n"; }
    4. Besides the stupid name, that subroutine is repetitious. A little juggling renders

    5. Down to under 100 lines.

      The modified program

      Still far from perfect. It almost certainly doesn't fix your problem--and given the absence of my ability to test it without the appropraite data, it may well not even do what your original does--but had you posted this code, you would probably have got a lot more interest from people in trying to help you with your problem. Whenever you find yourself copy&pasting code, do so once--into a subroutine.

      Note: I've stuck with teh stupid subroutine names, because I don;t know enough about what your program is doing to choose good ones.

    Give me a break! By the way, until about a week ago i had never heard of perl and had no idea whatsoever about computer programming so please be nice!!

    Programming is hard. If I walked in to your lab and told you that I had lifted random bits of technique from these papers: alopecia, Mutations cause extra limbs in amphibians, in my attempt to create a 4-legged bald chicken for the food industry, but it didn't work and asked for your help.

    What would your reaction be?

    Whomever expects you to pick up programming in one week and do something useful with it, whether that is your boss, your lecturer, lab mentor or yourself, is doing you no favours at all. Most programmer spend years learning their craft, trying do it in a week is just insulting.

    I'm not trying to discourage you, but get some training.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
      That is truly a beautiful and impressive demonstration of code analysis and reduction. It reminds me of a very similar exercise I had to do in C about 15 years ago, taking someone else's War and Peace-sized copy/paste extravaganza and boiling it down by creating subroutines with appropriate parameters (so it could compile and run on MS-DOS, 15 years ago). I was appalled that the author of the original monster had presumably majored in Computer Science. I was only several weeks into learning C at the time, but had already spent a few years with FORTRAN, so successful code analysis was very much a matter of having practical experience.

      Alas, in the current situation, the sad thing about NMRsucks's first opus -- whether reduced or not -- is the complete absence of a sensible algorithm, and the apparent lack of an appropriate spec for what the process is supposed to do.

      Your repair work is inspiring and instructive, and I applaud (and ++) that, but the program itself is still basically unworkable, and the OP will be better off starting over from scratch, after he figures out how to explain the goal properly.

        Thanks. 4 hours of disconnected boredom :)

        I completely agree with your assessment of the underlying lack of algorithm, which is basically why I took it no further (that and the phone engineer calling to tell me the line was back). It actually took much longer to write up (badly) than it did to do.

        I hope that it will serve the OP in three ways.

        1. You don't become a programmer in a week.
        2. Copy & Paste is not the answer.
        3. Small is beautiful. If you can see what your doing, often as not see, you can what your doing wrong.

        ++ to opus!


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon