Diesel has asked for the wisdom of the Perl Monks concerning the following question:

Hi everybody, I need help with a Perl/CGI task.

I reported my code below; I simplified it without using real data so to make it shorter

Essentially I have an hash of many sequences (only reported 2 in the example below)

In my hash the keys are the names of the sequences, while the values are the nucleotide sequences (DNA)

I want to iterate through the hash and filter it to make another hash only with those sequences which have

at least one of the patterns AGGAG or TTTTT; also if one, or both this patterns are present, I want to highlight them

in the sequences with 2 different colours. Until here, everything is fine. However some of my real sequences are very long (1000 nucleotides)

so I would like to print each sequence with 50 or 60 characters per line in my cgi page, to look more tidy.

In the code below I filtered the first hash with a regex and I added coloured tags to those 2 patterns I am looking for

The problem is that the unpack function messes up with the presence of the tags, which are included in the

characters count when unpacking and printing; therefore each line end up being of different length, or, even worse, some tag break because

they span 2 lines if they are in the location at which the unpacking is going to the next line

I have been trying to sort this out but I can not find a solution

I am a beginner, and I would like to find a simple solution, if there is one; without using extra modules, if possible

also because I do not have admin privileges in the Unix machine I am using

Many thanks

#!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; use CGI; my $cgi = new CGI; print $cgi->header(); print <<__EOF; <html> <head> <style type='text/css'> <!-- body { background: lightgrey; color: black; font-family: Courier; margin:20px; } h1 { color: black; font-family: Verdana; font-size: 150%; } h2 { color:red; font-family: Courier; } .sd { color:red; } .terminator { color: royalblue; } --> </head> </style> <h1> Sequences with <span class="sd"> SD sequence</span> or <span class="t +erminator">terminator structure</span>: </h1> <body> __EOF my %hash; my $name1= "name1"; my $name2= "name2"; $hash{$name1} = "ATATTATCCCCCTATATATGGAGGGAGAGGGGGGGGGGGGGGGGGGGGGGGGG +GGAGAGAGGAGATTTTTTTTTTTTTTTT" $hash{$name2} = "ATATATTATATATATTATATTCGCGCGCGCGGCGCGCGCGGCGCGCGCGTTTT +TTTTTTTTTTAGGAGAGAGAGGGAGGAGGAGAGGGGAGT" foreach my $key (keys %hash) { if ($hash{$key} =~ s!(aggag)!<span class="sd">$1</span>!gi) { + $hash2{$key} = $hash{$key}; } if ($hash{$key} =~ s!(ttttt)!<span class="terminator">$1</span>!ig) + { $hash2{$key} = $hash{$key}; } my %hash2; foreach my $key (keys %hash2) { print "<p> <b>$key</b><input type='radio' name='selected_intergenic' v +alue='$hash2{$key}'/> </p>"; print "<p>$_</p>\n" for unpack '(A50)*', $hash2{$key}; } print <<__EOF; <br /> </body> </html> __EOF

Replies are listed 'Best First'.
Re: Print N characters per line in Cgi Script with Html Tags
by choroba (Cardinal) on Jun 13, 2017 at 15:34 UTC
    You need to colour the sentences after you split them into lines. It complicates the code a bit, but it's not that hard: just remember the positions where colours start, unpack the lines, and then compute what line and position the markup belongs to. Note that the positions are sorted in reverse, so that inserting the markup doesn't change the positions to process.

    sub wrap { my ($lines, $pos, $markup) = @_; my $idx = int($pos / 50); substr $lines->[$idx], $pos % 50, 0, $markup; } my %dna; my $name1 = "name1"; my $name2 = "name2"; $dna{$name1} = "ATATTATCCCCCTATATATGGAGGGAGAGGGGGGGGGGGGGGGGGGGGGGGGGG +GAGAGAGGAGATTTTTTTTTTTTTTTT"; $dna{$name2} = "ATATATTATATATATTATATTCGCGCGCGCGGCGCGCGCGGCGCGCGCGTTTTT +TTTTTTTTTAGGAGAGAGAGGGAGGAGGAGAGGGGAGT"; my %class =(AGGAG => 'sd', TTTTT => 'terminator'); my $regex = join '|', keys %class; $regex = qr/((?i:$regex))/; my %highlight; for my $name (keys %dna) { while ($dna{$name} =~ /$regex/g) { $highlight{$name}{ pos($dna{$name}) - 5 } = $class{$1}; } } for my $key (keys %highlight) { print "<p> <b>$key</b><input type='radio' name='selected_intergeni +c' value='$dna{$key}'/> </p>"; my @lines = unpack '(A50)*', $dna{$key}; for my $pos (sort { $b <=> $a } keys %{ $highlight{$key} }) { my $class = $highlight{$key}{$pos}; wrap(\@lines, $pos + 5, '</span>'); wrap(\@lines, $pos, "<span class='$class'>"); } print join "<br />", @lines; }

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      Many Thanks this is working and it is exactly what I was trying to do. Many many thanks I have been trying to sort this out for a while now, and I was abou to give up and have a massively wide page. Do you think (be honest) this was a a good question? I could not find a similar question anywhere, and when I have tried asking it on SO, I have been heavily downvoted, criticized and offended. That is why I was also very hesitant into asking here; do not wanna appear as lazy, but I tried all I could think of. I honestly thought this was a decent question for beginners and I tried providing as much material as I could. Thanks again.
      I was wondering: I have noticed you use "5" in your solution; what if the patterns are of different length?
        You can store the lengths in the %class hash:
        my %class =(AGGAG => { name => 'sd' }, TT => { name => 'terminator' }); # I shortened the ter +minator for testing $class{$_}{length} = length $_ for keys %class; # ... $highlight{$name}{ pos($dna{$name}) - $class{$1}{length} } = $ +class{$1}; # ... my $class = $highlight{$key}{$pos}; wrap(\@lines, $pos + $class->{length}, '</span>'); wrap(\@lines, $pos, "<span class='$class->{name}'>");

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Print N characters per line in Cgi Script with Html Tags
by hippo (Archbishop) on Jun 13, 2017 at 15:07 UTC

    If I understand you correctly, all you need is something like this:

    #!/usr/bin/env perl use strict; use warnings; my $in = "ATATATTATATATATTATATTCGCGCGCGCGGCGCGCGCGGCGCGCGCGTTTTTTTTTTT +TTTAGGAGAGAGAGGGAGGAGGAGAGGGGAGT"; my $out = '<pre>' . join ("\n", $in =~ /[ATCG]{1,50}/g) . "</pre>\n"; print $out;

    Use HTML pre tags for preformatted output.

      this does not work; lines end up being of different length and the colours disappear. To simplify my task to the minimum, imagine, still inside a CGI script you have 2 strings:
      my $a: "ABCDEFGHI" my $b: "<b>ABC</b>DEFGHI"
      I want to print this 2 strings with 4 characters per line and in the second string ABC has to be in bold; If I use my method, or your method, this is what I get:
      $a: ABCD EFGH I $b: <b>A BC</ b>DE FGHI
      I want the bold tags ingored in the characters counts: So:
      $a: ABCD EFGH I $b: <b>ABC</b>D EFGH I
      Also as I said, when I try your code it also does something with the coloured tags I used; Note: in this example I am using bold tag to make it shorter, in my scrip I am using a span class as shown. I hope it is more clear. I do not know in advance where the position it is: i need an universal solution which can print something with the same number of characters per line, basically ignoring the tags in the counts.

        I see. If you have tags already in your data and want to skip over them then the best plan is to use a proper HTML parser.

        If you cannot (or will not) use a parser then a crude plan B which works for your supplied dataset is:

        #!/usr/bin/env perl use strict; use warnings; use Test::More; my @set = ( { in => 'ABCDEFGHI', want => "ABCD\nEFGH\nI\n" }, { in => '<b>ABC</b>DEFGHI', want => "<b>ABC</b>D\nEFGH\nI\n" }, ); plan tests => scalar @set; my $len = 4; for my $x (@set) { my $i = 0; my $out = ''; my $intag = 0; for my $c (split (//, $x->{in})) { $out .= $c; $intag++ if $c eq '<'; $intag-- if $c eq '>'; next if $intag || $c eq '>'; $i++; $out .= "\n" unless $i % $len; } $out .= "\n"; is ($out, $x->{want}); }

        This isn't robust (and is rather C-ish for my taste) but it serves to illustrate this approach in general terms. Have fun with it.

        Update: edited source for improved generality.

Re: Print N characters per line in Cgi Script with Html Tags
by poj (Abbot) on Jun 13, 2017 at 16:44 UTC

    This counts along the string and inserts line breaks at regular intervals

    #!/usr/bin/perl use strict; use warnings; use CGI; my $WIDTH = 40; my %hash = ( 'name1' => "ATATTATCCCCCTATATATGGAGGGAGAGGGGGGGGGGGGGGGGGGGGGGGGGGGAG +AGAGGAGATTTTTTTTTTTTTTTT", 'name2' => "ATATATTATATATATTATATTCGCGCGCGCGGCGCGCGCGGCGCGCGCGTTTTTTTT +TTTTTTAGGAGAGAGAGGGAGGAGGAGAGGGGAGT", ); my @select=(); my %pattern = ( 'AGGAG' => 'sd', 'TTTTTTT' => 'terminator' ); my $re = join '|',keys %pattern; # add markup foreach my $key (sort keys %hash) { if ($hash{$key} =~ s!($re)!<span class="$pattern{$1}">$1</span>!g){ push @select,$key; } } # create table my $table = q!<table width="100" cellpadding="5" cellspacing="5"> <tr style="font-style:italic"> <td width="20%">Key</td> <td width="5%">&nbsp;</td> <td width="75%">Sequence</td> </tr>!; # selected keys foreach my $key (@select) { my $seq = $hash{$key}; # count ACTG character to determine # position to insert line breaks my $count = 0; my @br; for my $p (0..length($seq)-1){ ++$count if substr($seq,$p,1) =~ /[ACTG]/; if ($count > $WIDTH){ push @br,$p; $count = 1; }; } # insert line breaks working backwards for my $p (reverse @br){ substr($seq,$p,0,'<br/>'); } # create row my $radio = qq!<input type="radio" name="selected_intergenic" value= +"$key"/>!; $table .= qq!<tr valign="top"> <td>$key</td> <td>$radio</td> <td>$seq</td></tr>!; } $table .= q!</table>!; # html page my $q = CGI->new; my $style = <<EOF; body { background: lightgrey; color: black; font-family: Courier; margin:20px; } h1 { color: black; font-family: Verdana; font-size: 150%; } h2 { color:red; font-family: Courier; } .sd { color:red; } .terminator { color: royalblue; } EOF my $note = qq!Sequences with <span class="sd"> SD sequence</span> or <span class="terminator">terminator structure</span>!; print $q->header, $q->start_html(-style=>{'code'=>$style}), $q->h1( $note ), $table, $q->end_html;
    poj