You're working on the sequence on a character-by-character basis, and trying to solve multiple subproblems at once. You're trying to figure out whether to turn the color to black, blue or red, and simultaneously trying to add line breaks every 50 characters. The logic to do that can be complicated, and there's too much to keep track of to debug it simply.
I suggest you find a way to break the problem down into pieces that are independent, and easier to debug in isolation. There are plenty of ways to do it, but I came up with the following method: First, build a list of operations to you want to perform at varying places in the string, and then write some code to perform each operation. To do so, I built a list called @ops, where each entry contains a column number and the operation I want to perform. Doing it this way, you can look at the list of operations to see if you've got it right. Then, you can generate your final string and see if it's correct.
The general approach I took is this:
The code I came up with to do the job is:
#!/usr/bin/perl use strict; use warnings; # Turn these to 0 to turn off debug display, or delete the appropriate + lines below my $dbg_OPS_LIST=1; my $dbg_OUTPUT=1; my $sequence="GGCGCAACGCTGAGCAGCTGGCGCGTCCCGCGCGGCCCCAGTTCTGCGCAGCTTCC +CGAGGCTCCGCACCAGCCGCGCTTCTGTCCGCCTGCAGGGCATT"; # Break output after every Nth character my $break = 50; # Array of operations to perform at particular character positions in +the sequence my @ops; # Add line breaks to our operations list { my $i=1; while (length($sequence) > $i*$break+1) { push @ops, [ $i*$break+1, 'BREAK' ]; ++$i; } } ### Build color-switch operations: Each column position in the input ### file must be RED, all others should be black. my $seqlen = length ($sequence); my $input = "pm1103009.data"; open( my $infile, "<", $input ) || die "Check the $input $!\n"; my $old = -1; while ( my $line = <$infile> + 0 ){ if ($line > $old+1) { # skip sequential numbers (don't need to switch to 'red' on # *every* character in reject sequence) # add "switch to black" operation (unless we're still at start +) push @ops, [ $old, 'BLACK' ] unless $old < 0; push @ops, [ $line, 'RED' ]; } $old = $line; } close ($infile); ### remove any color switches after the switch to blue @ops = grep { # Keep all ops before 10 chars before the end $_->[0] < length($sequence)-10 # Keep only 'BREAK' (noncolor) entries from then on or $_->[1] eq 'BREAK' } @ops; ### Add operation to switch to BLUE at 10 chars before the end push @ops, [ length($sequence)-10, 'BLUE' ]; ### Add DONE operation at end push @ops, [ length($sequence), 'DONE' ]; ### Sort the operations by column @ops = sort { $a->[0] <=> $b->[0] } @ops; ### Display our operation list if ($dbg_OPS_LIST) { print "\nFinal operations list:\n"; for my $r (@ops) { printf "% 3u: <%s>\n", $r->[0], $r->[1]; } print "\n"; } ### Now build our HTML file open(AA, ">fragments.html") or die $!; print AA q( <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.d +td"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> <title>Colored Gene Walk</title> <meta http-equiv="content-type" content="text/html;charset=utf +-8"/> </head> <body> <pre class=\"monofont\"> <a style="font-size: 12pt"> ); my $pos = 1; while ($pos < length($sequence)) { # Get the next operation from the list: my ($col, $op) = @{shift @ops}; $dbg_OUTPUT && print "LOOP: POS:$pos, COL:$col, OP:$op\n"; # Print the bit of the sequence up to the operation (if any) if ($pos < $col) { my $len = $col-$pos; my $subseq = substr($sequence, $pos-1, $len); $dbg_OUTPUT && print "\tTEXT($pos-$col): <$subseq>\n"; print AA $subseq; } # perform the operation if ($op eq 'BLACK') { $dbg_OUTPUT && print "\tBLACK($col)\n"; print AA '</span>'; } elsif ($op eq 'BREAK') { $dbg_OUTPUT && print "\tBREAK($col)\n"; print AA '<br/>'; } elsif ($op eq 'BLUE') { $dbg_OUTPUT && print "\tBLUE($col)\n"; print AA '<span style="color:blue">'; } elsif ($op eq 'RED') { $dbg_OUTPUT && print "\tRED($col)\n"; print AA '<span style="color:red">'; } elsif ($op eq 'DONE') { $dbg_OUTPUT && print "END LOOP\n"; print AA '</a></pre></body></html>', "\n"; } $pos = $col; }
I left in some debugging code, just in case it helps. Notice that I tried to make each section stand on its own, so you can look at what's happening in each step. Running it as-is gives me:
$ perl pm1103009.pl Final operations list: 11: <RED> 25: <BLACK> 51: <BREAK> 51: <RED> 60: <BLACK> 86: <RED> 90: <BLUE> 100: <DONE> LOOP: POS:1, COL:11, OP:RED TEXT(1-11): <GGCGCAACGC> RED(11) LOOP: POS:11, COL:25, OP:BLACK TEXT(11-25): <TGAGCAGCTGGCGC> BLACK(25) LOOP: POS:25, COL:51, OP:BREAK TEXT(25-51): <GTCCCGCGCGGCCCCAGTTCTGCGCA> BREAK(51) LOOP: POS:51, COL:51, OP:RED RED(51) LOOP: POS:51, COL:60, OP:BLACK TEXT(51-60): <GCTTCCCGA> BLACK(60) LOOP: POS:60, COL:86, OP:RED TEXT(60-86): <GGCTCCGCACCAGCCGCGCTTCTGTC> RED(86) LOOP: POS:86, COL:90, OP:BLUE TEXT(86-90): <CGCC> BLUE(90) LOOP: POS:90, COL:100, OP:DONE TEXT(90-100): <TGCAGGGCAT> END LOOP
The code isn't any shorter than yours, but it should be easy to debug. Additionally, since it's broken into sections, you can use the same operations list to convert it to PDF, a Word document, or anything else, if you like. You can also add new operation types (add bold, hyperlinks or anything else).
I intended to go into more discussion, but I ran out of time (I'm at work right now), so if you're interested, examine the code. If you have any specific questions, follow up with a question, and I'll get back to you when I can.
...roboticus
When your only tool is a hammer, all problems look like your thumb.
In reply to Re: Code optimization help and troubleshooting
by roboticus
in thread Code optimization help and troubleshooting
by newtoperlprog
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |