newtoperlprog has asked for the wisdom of the Perl Monks concerning the following question:

Dear All

I am trying to output a sequence file with some region bold and colored red using perl script.

Below is my script and files

#!/usr/bin/perl use strict; use warnings; use Term::ANSIColor; my $file = $ARGV[0]; if (@ARGV < 1){ print STDERR "Usage: $0 input_fasta_file\n"; exit 1; } my ($header, $sequence); open (A, "<", $file) or die "Check the file: $!"; while (my $line = <A>){ chomp $line; if ($line =~ /^(>.*)/){ $header = $1; } else{ $sequence .= $line; } } close (A); $sequence =~s/[\n\s]//; my @sequence = split ("", $sequence); #print "@sequence\n"; my @position; my $pos_file = "sorted_position_walk.txt"; open (A, "<", $pos_file) or die "Check the file: $!"; while (my $line = <A>){ chomp $line; my $pos = (split /\t/,$line)[0]; push(@position, $pos); } for (my $i=0;$i<=scalar(@sequence);$i++){ foreach my $value(@position){ if($value eq $sequence[$i]){ my $colortext = colored (($value+18), 'bold red'); print "$sequence[$i]\n"; } } }
Fasta file: >gi|292658763|ref|NM_014143.3| Homo sapiens CD274 molecule (CD274), tr +anscript variant 1, mRNA GGCGCAACGCTGAGCAGCTGGCGCGTCCCGCGCGGCCCCAGTTCTGCGCAGCTTCCCGAGGCTCCGCACC AGCCGCGCTTCTGTCCGCCTGCAGGGCATTCCAGAAAGATGAGGATATTTGCTGTCTTTATATTCATGAC CTACTGGCATTTGCTGAACGCATTTACTGTCACGGTTCCCAAGGACCTATATGTGGTAGAGTATGGTAGC AATATGACAATTGAATGCAAATTCCCAGTAGAAAAACAATTAGACCTGGCTGCACTAATTGTCTATTGGG AAATGGAGGATAAGAACATTATTCAATTTGTGCATGGAGAGGAAGACCTGAAGGTTCAGCATAGTAGCTA CAGACAGAGGGCCCGGCTGTTGAAGGACCAGCTCTCCCTGGGAAATGCTGCACTTCAGATCACAGATGTG AAATTGCAGGATGCAGGGGTGTACCGCTGCATGATCAGCTATGGTGGTGCCGACTACAAGCGAATTACTG TGAAAGTCAATGCCCCATACAACAAAATCAACCAAAGAATTTTGGTTGTGGATCCAGTCACCTCTGAACA TGAACTGACATGTCAGGCTGAGGGCTACCCCAAGGCCGAAGTCATCTGGACAAGCAGTGACCATCAAGTC CTGAGTGGTAAGACCACCACCACCAATTCCAAGAGAGAGGAGAAGCTTTTCAATGTGACCAGCACACTGA GAATCAACACAACAACTAATGAGATTTTCTACTGCACTTTTAGGAGATTAGATCCTGAGGAAAACCATAC AGCTGAATTGGTCATCCCAGAACTACCTCTGGCACATCCTCCAAATGAAAGGACTCACTTGGTAATTCTG GGAGCCATCTTATTATGCCTTGGTGTAGCACTGACATTCATCTTCCGTTTAAGAAAAGGGAGAATGATGG ATGTGAAAAAATGTGGCATCCAAGATACAAACTCAAAGAAGCAAAGTGATACACATTTGGAGGAGACGTA ATCCAGCATTGGAACTTCTGATCTTCAAGCAGGGATTCTCAACCTGTGGTTTAGGGGTTCATCGGGGCTG AGCGTGACAAGAGGAAGGAATGGGCCCGTGGGATGCAGGCAATGTGGGACTTAAAAGGCCCAAGCACTGA AAATGGAACCTGGCGAAAGCAGAGGAGGAGAATGAAGAAAGATGGAGTCAAACAGGGAGCCTGGAGGGAG ACCTTGATACTTTCAAATGCCTGAGGGGCTCATCGACGCCTGTGACAGGGAGAAAGGATACTTCTGAACA AGGAGCCTCCAAGCAAATCATCCATTGCTCATCCTAGGAAGACGGGTTGAGAATCCCTAATTTGAGGGTC AGTTCCTGCAGAAGTGCCCTTTGCCTCCACTCAATGCCTCAATTTGTTTTCTGCATGACTGAGAGTCTCA GTGTTGGAACGGGACAGTATTTATGTATGAGTTTTTCCTATTTATTTTGAGTCTGTGAGGTCTTCTTGTC ATGTGAGTGTGGTTGTGAATGATTTCTTTTGAAGATATATTGTAGTAGATGTTACAATTTTGTCGCCAAA CTAAACTTGCTGCTTAATGATTTGCTCACATCTAGTAAAACATGGAGTATTTGTAAGGTGCTTGGTCTCC TCTATAACTACAAGTATACATTGGAAGCATAAAGATCAAACCGTTGGTTGCATAGGATGTCACCTTTATT TAACCCATTAATACTCTGGTTGACCTAATCTTATTCTCAGACCTCAAGTGTCTGTGCAGTATCTGTTCCA TTTAAATATCAGCTTTACAATTATGTGGTAGCCTACACACATAATCTCATTTCATCGCTGTAACCACCCT GTTGTGATAACCACTATTATTTTACCCATCGTACAGCTGAGGAAGCAAACAGATTAAGTAACTTGCCCAA ACCAGTAAATAGCAGACCTCAGACTGCCACCCACTGTCCTTTTATAATACAATTTACAGCTATATTTTAC TTTAAGCAATTCTTTTATTCAAAAACCATTTATTAAGTGCCCTTGCAATATCAATCGCTGTGCCAGGCAT TGAATCTACAGATGTGAGCAAGACAAAGTACCTGTCCTCAAGGAGCTCATAGTATAATGAGGAGATTAAC AAGAAAATGTATTATTACAATTTAGTCCAGTGTCATAGCATAAGGATGATGCGAGGGGAAAACCCGAGCA GTGTTGCCAAGAGGAGGAAATAGGCCAATGTGGTCTGGGACGGTTGGATATACTTAAACATCTTAATAAT CAGAGTAATTTTCATTTACAAAGAGAGGTCGGTACTTAAAATAACCCTGAAAAATAACACTGGAATTCCT
Sorted_position_walk.txt 88 91 92 94 101 113 114 121 122 124 125 126 134 140 146 148 153 159 171 173 183

The final output I am trying to get is to bold and red for the regions which matches the sorted_position_walk.txt file (match number + 18) so total of 19 characters bold and red. For exapmple, in the final output the sequence should have regions from position number 88 +18 as bold and red and so on.

I am also trying to get the output saved in a text file with the formatting of bold and red saved.

Any help and direction will be greatly appreciated.

Regards

Replies are listed 'Best First'.
Re: bold color text and export to file
by roboticus (Chancellor) on Sep 15, 2014 at 22:31 UTC

    newtoperlprog:

    Something like this?

    use strict; use warnings; use Term::ANSIColor; my $t = "ACGCGATAGCATTAGACCTGGCACAGT"; $t =~ s/([CG]AT)/colored($1,'bold red')/ge; print $t, "\n";

    If you have the positions and lengths you want highlighted, you can do it like:

    use strict; use warnings; use Term::ANSIColor; my $t = "ACGCGATAGCATTAGACCTGGCACAGT"; my @highlights = ( # [ start, len, color ] [ 20, 5, 'bold blue' ], [ 10, 2, 'bold red' ], [ 3, 4, 'bold green' ], ); for my $ar (@highlights) { my ($start, $len, $color) = @$ar; $t = substr($t, 0, $start-1) # first part . colored(substr($t,$start,$len), $color) # colored part . substr($t,$start+$len); # final part } print $t, "\n";

    Note: If you do it this way, keep in mind that colorizing changes the positions of characters in the string, so: (1) Always start at the end and work towards the front, and (2) *NEVER* use overlapping areas.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: bold color text and export to file
by Perlbotics (Archbishop) on Sep 15, 2014 at 21:22 UTC

    You could start with Term::ANSIColor. Insert the ANSI-sequences for red/bold at the start of a sequence to be highlighted and switch back to normal at the end of the sequence.

Re: bold color text and export to file
by Anonymous Monk on Sep 15, 2014 at 21:23 UTC

    I am also trying to get the output saved in a text file with the formatting of bold and red saved.

    Um, bold/color? Are you talking ANSIColor or html?

Re: bold color text and export to file
by pvaldes (Chaplain) on Sep 22, 2014 at 20:10 UTC

    color output was difficult to save in text file

    You can do it. It depends of the desired output format... want a yummy pdf?

    my $dna = "AAAAAACAAACAAACCCAATATATATATACGACATATATTATATATTATACCCCGGG"; my $preciouss = substr $dna, 15, 28; open (my $output, '>', "latex_file.tex"); print $output "\\documentclass{article}\n\\usepackage{color}\n"; print $output "\\begin{document}\n"; print $output "THIS IS MY BORING GENE: \\textcolor{red}{", $preciouss, + "}\\\\\\textcolor{blue}{OKAY, OKAY... \\textit{NOT SO BORING}, {\\bf + IS RED!}}"; print $output "\n\\end{document}"; close $output; system("pdflatex latex_file.tex"); system("xpdf latex_file.pdf &");

    You will need to have installed a decent tex distro and xpdf. See texlive. You can also translate to html, dvi or postscript easily from here, as you will

    Update: typo fixed in textit and new link
Re: bold color text and export to file
by newtoperlprog (Sexton) on Sep 15, 2014 at 21:29 UTC

    Thank you for your reply

    I have used the module Term::ANSIColor and have tried to color the output in the variable $colortext upon matching.

    I am trying to keep the terminal formatting in the exported text file

    Regards

      What program(s) do you expect to view the exported text file in?

      Colors applied with Term::ANSIColor will work with cat and less with the -R switch, but not normal text editors.

Re: bold color text and export to file
by newtoperlprog (Sexton) on Sep 16, 2014 at 16:50 UTC

    Dear Roboticus

    Greatly appreciate your help. Well I tried your solution and it worked like charm. However, I am trying to do automatic array building but looks like I am not writing proper syntax.

    I am trying to make the @highlights by reading a file having the position number and it doesn't seem to work, whereas if I manually put the positions in the code it the output is not as I am trying to get.

    Below is my code and data:

    #!/usr/bin/perl use strict; use warnings; use Term::ANSIColor; my $file = $ARGV[0]; if (@ARGV < 1){ print STDERR "Usage: $0 input_fasta_file\n"; exit 1; } my ($header, $sequence); open (A, "<", $file) or die "Check the file: $!"; while (my $line = <A>){ chomp $line; if ($line =~ /^(>.*)/){ $header = $1; } else{ $sequence .= $line; } } close (A); $sequence =~s/[\n\s]//; #print "$sequence\n"; =comment my @highlights = ( [ 88, 1, 'bold red' ], [ 101, 1, 'bold red' ], [ 113, 1, 'bold red' ], [ 121, 1, 'bold red' ], [ 124, 1, 'bold red' ], [ 134, 1, 'bold red' ], [ 140, 1, 'bold red' ], [ 146, 1, 'bold red' ], ); =cut my @highlights; my $pos_file = "sorted_position_walk.txt"; open (A, "<", $pos_file) or die "Check the file: $!"; while (my $line = <A>){ chomp $line; my $pos = (split /\t/,$line)[0]; # push(@position, $pos); push(@highlights, "([ $pos, 1, 'bold yellow' ],)"); } #foreach my $pp(@highlights){ # print "[ $pp, 1, 'bold red' ],\n"; #} #print "@highlights\n"; for my $ar (@highlights) { my ($start, $len, $color) = @$ar; $sequence = substr($sequence, 0, $start-1) # fi +rst part . colored(substr($sequence,$start,$len), $color) # colo +red part . substr($sequence,$start+$len); # fina +l part } print $sequence, "\n";
    Output with some weird characters: TCTGTCCGCTCG3m1m[0mGGCAT

    Regards

      newtoperlprog:

      When you use the color editing code, it inserts characters into the string. That shifts all the character positions to the right. That's why I suggested you always go from the right and work left in the note at the end. That's the reason that you're seeing weird characters in the text. Those weird characters are parts the terminal commands that tell the terminal to change colors.

      For example, suppose you wanted to highlight the string CAT every time it appears, and also that the terminal command for changing the color to normal is BEER and the command to change the color to highlight is KETCHUP. Then if your input string looks like this:

      ATCGCGATCATCCATACTCATTAG

      Then the positions you're wanting to highlight are at 8, 12 and 18. If we apply the edits from left to right we get:

      ATCGCGATBEERCATKETCHUPCCATACTCATTAG edit at 8 ATCGCGATBEEBEERRCAKETCHUPTKETCHUPCCATACTCATTAG then 12 ATCGCGATBEEBEERRCABEERKETKETCHUPCHUPTKETCHUPCCATACTCATTAG then 18 ???^^^^? ^^^^???vvvvvvv?????vvvvvvv

      UGH! The string has garbage in it now. To point it out, I've marked the resulting string with ^^^^ to indicate the command to switch to bold, vvvvvvv to show the command that switches back to normal, and ? for any garbage characters. So we'd see BEE in the regular color, followed by RCAKET in bold, then CHUPTCCATACTCATTAG in regular color. However, if we do the edits from right to left, though, we get:

      ATCGCGATCATCCATACTBEERCATKETCHUPTAG edit at 18 ATCGCGATCATCBEERCATKETCHUPACTBEERCATKETCHUPTAG then 12 ATCGCGATBEERCATKETCHUPCBEERCATKETCHUPACTBEERCATKETCHUPTAG then 8 ^^^^ vvvvvvv ^^^^ vvvvvvv ^^^^ vvvvvvv

      So the three "CAT" sequences are in bold, and the rest of the string is displayed in regular color.

      To convert your data to use HTML formatting instead, you can use roughly the same code, but instead of inserting BEER and KETCHUP from Term::ANSIColor (or whatever it is that it uses for your terminal), you could use <font color="red"> and </font> as Anonymous Monk indicated later in the thread. In that case, though, you'll *still* want to go from right to left!

      Notes:

      1) Now that the node is formatted, I'm really revolted by the pairing of beer and ketchup. But it was annoying enough to format that I'll leave it alone, nausea-inducing as it is.

      2) If you had replied to my node instead of yourself, I'd've seen the message sooner and replied. (When time pressed, I generally only look at top-level nodes and replies to my nodes.)

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

        Dear Roboticus and Anonymous Monk

        Thank you for your suggestions and directions and I am extremely sorry for any confusion if any.

        As you mentioned in your earlier post, that this will not work if the sequences are overlapping and I have found that some of my regions would be overlapping.

        I was also thinking if this could be done by an array of range operator and for each array element the sequence will be either bold or colored.

        Below is my code which I was trying:

        #!/usr/bin/perl #use strict; use warnings; my $filename = "NM_014143.3.fasta"; my @name = split( /\./, $filename ); my $name = $name[0]; my ($infile, @temp); open( $infile, "<", $filename ) || die "Check the $filename $!\n"; while ( my $line = <$infile> ) { chomp $line; if ( $line =~ /^>/ ) { next; } elsif ( $line =~ /^\s*$/ ) { next; } elsif ( $line =~ /^\s*#/ ) { next; } else { $sequence .= $line; } } $sequence =~ s/\n//g; $sequence =~ s/\s+//g; #print "$sequence\n"; close ($infile); my @seq = (1 .. 15, 30 .. 40, 50 .. 60); #this is a range array for (my $pos=1;$pos<=length($sequence);$pos++){ foreach my $ar(@seq){ if($pos == $ar){ push (@temp, "<b>",$sequence, "</b"); } else{ push (@temp, $sequence); } } } my $tmp = join ("", @temp); print "$tmp\n";
        Data file: GGCGCAACGCTGAGCAGCTGGCGCGTCCCGCGCGGCCCCAGTTCTGCGCAGCTTCCCGAGGCTCCGCACC + CC +TGCAGGGCATTCCAGAAAGATGAGGATATTTGCTGTCTTTATATTCATGACCATTTGCTGAACGCATT TACTGTCACGGTTCCCAAGGACCTATATGTGGTAGAGTATGGTAGC + AT +GACAATTGAATGCAAATTCCCAGTAGAAAAACAATTAGACCTGGCTGCACTAATTGTCTATTGGG AAATGGAGGATAAGAACATTATTCAATTTGTGCATGGAGAGGAAGACCTGAAGGTTCAGCATAGTAGCTA CAGACAGAGGGCCCGGCTGTTGAAGGACCAGCTCTCCCTGGGAAATGCTGCACTTCAGATCACAGATGTG AAATTGCAGGATGCAGGGGTGTACCGCTGCATGATCAGCTATGGTGGTGCCGACTACAAGCGAATTACTG TGAAAGTCAATGCCCCATACAACAAAATCAACCAAAGAATTTTGGTTGTGGATCCAGTCACCTCTGAACA TGAACTGACATGTCAGGCTGAGGGCTACCCCAAGGCCGAAGTCATCTGGACAAGCAGTGACCATCAAGTC CTGAGTGGTAAGACCACCACCACCAATTCCAAGAGAGAGGAGAAGCTTTTCAATGTGACCAGCACACTGA
bold color text and export to file
by newtoperlprog (Sexton) on Sep 22, 2014 at 15:18 UTC

    Dear All

    Thank you for your replies

    Since, color output was difficult to save in text file. I am now trying to make the output as html formatted which will suffice my need.

    For the sequence file, I am trying to color the font red for a part of sequence, leaving rest as default font color (black).

    #!/usr/bin/perl + use strict; use warnings; my $sequence = ''; my $filename = "NM_014143.3.fasta"; my @name = split( /\./, $filename ); my $name = $name[0]; my $infile; my $outfile;my $out;my $reject; my @missing; open( $infile, "<", $filename ) || die "Check the $filename $!\n"; while ( my $line = <$infile> ) { chomp $line; if ( $line =~ /^>/ ) { next; } elsif ( $line =~ /^\s*$/ ) { next; } elsif ( $line =~ /^\s*#/ ) { next; } else { $sequence .= $line; } } $sequence =~ s/\n//g; $sequence =~ s/\s+//g; #print "$sequence\n"; + close ($infile); my @seq = (1 .. 15, 30 .. 40, 50 .. 60); for (my $pos = 0; $pos <= length($sequence); $pos++){ foreach my $ran (@seq){ my $frag = substr($sequence, $pos, $ran); print "<font color=\"red\">$frag</font>\n"; } }
    Data file: >gi|292658763|ref|NM_014143.3| Homo sapiens CD274 molecule (CD274), tr +anscript variant 1, mRNA GGCGCAACGCTGAGCAGCTGGCGCGTCCCGCGCGGCCCCAGTTCTGCGCAGCTTCCCGAGGCTCCGCACC AGCCGCGCTTCTGTCCGCCTGCAGGGCATTCCAGAAAGATGAGGATATTTGCTGTCTTTATATTCATGAC CTACTGGCATTTGCTGAACGCATTTACTGTCACGGTTCCCAAGGACCTATATGTGGTAGAGTATGGTAGC AATATGACAATTGAATGCAAATTCCCAGTAGAAAAACAATTAGACCTGGCTGCACTAATTGTCTATTGGG AAATGGAGGATAAGAACATTATTCAATTTGTGCATGGAGAGGAAGACCTGAAGGTTCAGCATAGTAGCTA CAGACAGAGGGCCCGGCTGTTGAAGGACCAGCTCTCCCTGGGAAATGCTGCACTTCAGATCACAGATGTG AAATTGCAGGATGCAGGGGTGTACCGCTGCATGATCAGCTATGGTGGTGCCGACTACAAGCGAATTACTG TGAAAGTCAATGCCCCATACAACAAAATCAACCAAAGAATTTTGGTTGTGGATCCAGTCACCTCTGAACA TGAACTGACATGTCAGGCTGAGGGCTACCCCAAGGCCGAAGTCATCTGGACAAGCAGTGACCATCAAGTC CTGAGTGGTAAGACCACCACCACCAATTCCAAGAGAGAGGAGAAGCTTTTCAATGTGACCAGCACACTGA GAATCAACACAACAACTAATGAGATTTTCTACTGCACTTTTAGGAGATTAGATCCTGAGGAAAACCATAC

    The problem I am facing is how to print the sequence with selected regions as red font color.

    Desired output: <font color="red">GGCGCAACGCTGAGC</font>AGCTGGCGCGTCCCG<font color="re +d">CGCGGCCCCA</font>GTTCTGCGCA<font color="red">GCTTCCCGAG</font>GCTC +CGCACC AGCCGCGCTTCTGTCCGCCTGCAGGGCATTCCAGAAAGATGAGGATATTTGCTGTCTTTATATTCATGAC CTACTGGCATTTGCTGAACGCATTTACTGTCACGGTTCCCAAGGACCTATATGTGGTAGAGTATGGTAGC AATATGACAATTGAATGCAAATTCCCAGTAGAAAAACAATTAGACCTGGCTGCACTAATTGTCTATTGGG AAATGGAGGATAAGAACATTATTCAATTTGTGCATGGAGAGGAAGACCTGAAGGTTCAGCATAGTAGCTA CAGACAGAGGGCCCGGCTGTTGAAGGACCAGCTCTCCCTGGGAAATGCTGCACTTCAGATCACAGATGTG AAATTGCAGGATGCAGGGGTGTACCGCTGCATGATCAGCTATGGTGGTGCCGACTACAAGCGAATTACTG TGAAAGTCAATGCCCCATACAACAAAATCAACCAAAGAATTTTGGTTGTGGATCCAGTCACCTCTGAACA TGAACTGACATGTCAGGCTGAGGGCTACCCCAAGGCCGAAGTCATCTGGACAAGCAGTGACCATCAAGTC CTGAGTGGTAAGACCACCACCACCAATTCCAAGAGAGAGGAGAAGCTTTTCAATGTGACCAGCACACTGA GAATCAACACAACAACTAATGAGATTTTCTACTGCACTTTTAGGAGATTAGATCCTGAGGAAAACCATAC

    Thank you and any help will be greatly appreciated.

    Regards

      roboticus's suggestion can easily be adapted to output HTML (why did you stop using it when it "worked like charm"?). Simply replace the part that calls the colored function with your "<font ..." strings. Keep in mind roboticus's comment that the positions for replacements need to be in descending order, so you may want to sort your array of positions accordingly.