reciter has asked for the wisdom of the Perl Monks concerning the following question:

hello perl monks

I am trying to find palindromic sequence from file, which contains multiple sequences.

the script is providing result of palindromes present in file but it is not able to print the title of sequence

also after using "use strict and use warnings" script stop running and shows compilation error

please help me

here is the script I am using

#!/usr/bin/perl open (TEXT, "sample.txt")||die"Cannot"; my $pat=qr'^(Contig +([0-9]*))\s'; my $count = 0; for my $n (5..20) { my $re = qr /[CAGU]{$n}/; $regexes[$n-5] = $re; } NEXTLINE: while ($count < 1000) { my $line = <TEXT> ; $count++; foreach my $value (@regexes) { my $start = 0; while ($line =~ /$value/g) { my $endline = $'; my $match = $&; my $revmatch = reverse($match); $revmatch =~ tr/CAGU/GUCA/; if ($endline =~ /^([CAGU]{0,15})($revmatch)/) { $start = 1; my $palindrome = $match . "*" . $1 . "*" . $2; $palhash{$palindrome}++; } } if ($start == 0) { goto NEXTLINE; } } } print "L.$pat\n"; close TEXT; while(($key, $value) = each (%palhash)) { print "$key => $value\n"; } exit;

Here is sample.txt example

Contig1 NAAAAGUAUAGGCUCGAGAGAGAAGUCCUGGCCUAUCGGAUUACACCACGNGUCAGAUCU GUCACUUCAAGGAGCUUUUCAGCGUCUUUGACAAGAAUGGCNGACGGUUGCCGUCCUUCC AGGGAGAUUGGGGCAGAAUUCGAGUCACUGCCNGUGGAUUCCUUAGUGAUCCAGAUGUCG GAUAUAUGUCCAAUAUAGCCAAUNGUUGUUGGACAUAGCACCAUCAAUCAUCAAGAUUGC UAUUUCGCCACCUCNAUGUAGAGUGGAAGAUCCAAAUGCGGGAAUGGAUUUCAAGGUUGG UUAAGNAUGGGGGCAGCGGAUGAGUUAACAGUACAGCAAGCUAGGCAGCUCCAUUUNUGA CUUGGAAUCUUCAUACGAUUCAUUUAUGGCAGCUUUGCCUAAUGUUGNGUACUUGAAAGU AAAAUCCAAUAUUUCCAUAUUUUUGUUUAGUGCUUCAAAUCUGGUGAGAACGGAUGUUUU AAGUUGGUAAAACAGUGUAUUCAUUUUGAAGAGUUCUUGGAUUGUUUAAAGCUCAAGAUG CUAUUUGUGUGCUGCUGAUUUGUCGUUCUAUGAGAAAGAAUAUAUGCUUUAAUUUGGUUU UGUAAUAUUAANUAUAUUCAUUCCCCUUGAUUGUUGUUUUGUNNNNNNNNNNNNNNNNNN NN Contig2 NUGUUUUUUCUUUACUCGGUGUCUCGUCGGUACUCGACGACUAAANGGGNUAUAGGAGGG GCGCGAACGNCGAAACGGGAGUGGGUACAAAGCGUGGUCCNGGAUAGACGAGAGCGGACG CGCCGGGUGAAGCUCGGCGCGACGCGAGGCANAGAGGGGCCCGCAGANNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNAAGNCCCAGGAUGCGCAUCGCCCCAGGGGUGAAACCCCCAUCCCAAAGAA UGCUNCUCCUNCGCGGUAGGGCAGCUNCCCGAAGCACCCGACCGCUUUNAGGCCANCCAU AUGAUAAAGNAACGUUGUGUGGUGAAUGGGAUGAAGAUGAUUGAAGNAGAGUAGAGUUUU GCCUCUANAUCUUGAUAUGUAUAUCUUUAAUUAUAUAANUAUAGCUCAUUAUAUUGUGCN UUAGCAUGUAAUAUUUAAGUCUAAAAUUAANUGGACCUCAGCUCGAGGUCGNCAUUCUUU GUUACUUUAGAUCAGAUCUGUANUUCCCUUUGUAUUGUUCAGGNUUUCCAACCAUAAAUU AUUGGUACUAUCUUNAUUGUUAUCAUAAUUGACGUGUGUUUAAGUUCNNNNNNNNNNNNN NNNNN

I want my result in this form:(but it is not happening)

Contig1 GACGG*UUG*CCGUC => 1 GAUGC**GCAUC => 1 CGCCG*GGUGAAGCU*CGGCG => 1 Contig2 CAAUC*AUCAA*GAUUG => 1 GAUAU*GU*AUAUC => 1 AAAAU*CCAAUAUUUCCAU*AUUUU => 1

PROBLEM SOLVED

Replies are listed 'Best First'.
Re: Palindrome sequence from file containing mutliple sequences
by choroba (Cardinal) on Feb 20, 2015 at 10:50 UTC
    If you want to record the contigs, don't save just the palindromes in the hash, but group them by contig first:
    #!/usr/bin/perl use warnings; use strict; open my $IN, '<', 'sample.txt' or die $!; my $pat = qr/^(Contig *([0-9]*))\s/; my $count = 0; my @regexes; for my $n (5 .. 20) { my $re = qr /[CAGU]{$n}/; $regexes[$n-5] = $re; } my %palhash; my $contig; LINE: while ($count < 1000) { my $line = <$IN> ; defined $line or last; $contig = $line if $line =~ /$pat/; ++$count; for my $value (@regexes) { my $start = 0; while ($line =~ /$value/g) { my $endline = $'; my $match = $&; my $revmatch = reverse($match); $revmatch =~ tr/CAGU/GUCA/; if ($endline =~ /^([CAGU]{0,15})($revmatch)/) { $start = 1; my $palindrome = $match . "*" . $1 . "*" . $2; $palhash{$contig}{$palindrome}++; } } next LINE if $start == 0; } } close $IN; for my $contig (keys %palhash) { print $contig; while (my ($key, $value) = each (%{ $palhash{$contig} })) { print "$key => $value\n"; } }

    I changed some parts of the code as well, e.g. 3 argument open or die, lexical filehandles, don't process if the input is short, no goto. I don't understand the requirements in detail, but moving away from $' and $& would be a good idea, too.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      hey choroba, many thanks for your help.

      have a nice day
Re: Palindrome sequence from file containing mutliple sequences
by Anonymous Monk on Feb 21, 2015 at 00:38 UTC

    Here's a slightly shorter version that I think does the same thing, using the regex engine to do all the iteration:

    #!/usr/bin/perl # Palindrome sequence http://perlmonks.org/?node_id=1117312 use strict; use warnings; my $contig; # section title my %palhash; my $count = 0; while(<DATA>) { $count++ < 1000 or last; /Contig/ and $contig = $_; /([CAGU]{5,20}) # left part ([CAGU]{0,15}) # middle part ((??{reverse($1)=~tr|CAGU|GUCA|r})) # reverse (right) match (??{$palhash{$contig}{"$1*$2*$3"}++})^/x; # count found, fail } for my $contig ( sort keys %palhash ) { print $contig; for my $key ( sort keys %{ $palhash{$contig} } ) { print "$key => $palhash{$contig}{$key}\n"; } print "\n"; } __DATA__ Contig1 NAAAAGUAUAGGCUCGAGAGAGAAGUCCUGGCCUAUCGGAUUACACCACGNGUCAGAUCU GUCACUUCAAGGAGCUUUUCAGCGUCUUUGACAAGAAUGGCNGACGGUUGCCGUCCUUCC AGGGAGAUUGGGGCAGAAUUCGAGUCACUGCCNGUGGAUUCCUUAGUGAUCCAGAUGUCG GAUAUAUGUCCAAUAUAGCCAAUNGUUGUUGGACAUAGCACCAUCAAUCAUCAAGAUUGC UAUUUCGCCACCUCNAUGUAGAGUGGAAGAUCCAAAUGCGGGAAUGGAUUUCAAGGUUGG UUAAGNAUGGGGGCAGCGGAUGAGUUAACAGUACAGCAAGCUAGGCAGCUCCAUUUNUGA CUUGGAAUCUUCAUACGAUUCAUUUAUGGCAGCUUUGCCUAAUGUUGNGUACUUGAAAGU AAAAUCCAAUAUUUCCAUAUUUUUGUUUAGUGCUUCAAAUCUGGUGAGAACGGAUGUUUU AAGUUGGUAAAACAGUGUAUUCAUUUUGAAGAGUUCUUGGAUUGUUUAAAGCUCAAGAUG CUAUUUGUGUGCUGCUGAUUUGUCGUUCUAUGAGAAAGAAUAUAUGCUUUAAUUUGGUUU UGUAAUAUUAANUAUAUUCAUUCCCCUUGAUUGUUGUUUUGUNNNNNNNNNNNNNNNNNN NN Contig2 NUGUUUUUUCUUUACUCGGUGUCUCGUCGGUACUCGACGACUAAANGGGNUAUAGGAGGG GCGCGAACGNCGAAACGGGAGUGGGUACAAAGCGUGGUCCNGGAUAGACGAGAGCGGACG CGCCGGGUGAAGCUCGGCGCGACGCGAGGCANAGAGGGGCCCGCAGANNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNAAGNCCCAGGAUGCGCAUCGCCCCAGGGGUGAAACCCCCAUCCCAAAGAA UGCUNCUCCUNCGCGGUAGGGCAGCUNCCCGAAGCACCCGACCGCUUUNAGGCCANCCAU AUGAUAAAGNAACGUUGUGUGGUGAAUGGGAUGAAGAUGAUUGAAGNAGAGUAGAGUUUU GCCUCUANAUCUUGAUAUGUAUAUCUUUAAUUAUAUAANUAUAGCUCAUUAUAUUGUGCN UUAGCAUGUAAUAUUUAAGUCUAAAAUUAANUGGACCUCAGCUCGAGGUCGNCAUUCUUU GUUACUUUAGAUCAGAUCUGUANUUCCCUUUGUAUUGUUCAGGNUUUCCAACCAUAAAUU AUUGGUACUAUCUUNAUUGUUAUCAUAAUUGACGUGUGUUUAAGUUCNNNNNNNNNNNNN NNNNN

    Here's the output I get:

    Contig1 AAAAU*CCAAUAUUUCCAU*AUUUU => 1 AAUAU*UUCC*AUAUU => 1 AUCCA*AAUGCGGGAA*UGGAU => 1 CAAUC*AUCAA*GAUUG => 1 GAAUC*UUCAUAC*GAUUC => 1 GACGG*UUG*CCGUC => 1 GGCAG*AAUUCGAGUCA*CUGCC => 1 UAUAU*GUCCA*AUAUA => 1 UCUUG*GAUUGUUUAAAGCU*CAAGA => 1 UGGAU*UCCUUAGUG*AUCCA => 1 Contig2 ACCUC*AGCUC*GAGGU => 1 AGAUC*A*GAUCU => 1 CGCCG*GGUGAAGCU*CGGCG => 1 CGUCG*GUACU*CGACG => 1 GACCU*CAGCUCG*AGGUC => 1 GACCUC*AGCUC*GAGGUC => 1 GAUAU*GU*AUAUC => 1 GAUGC**GCAUC => 1 GGGGU*GAA*ACCCC => 1 UAUAU*CUUUAAUU*AUAUA => 1 UCGUC*GGUACUC*GACGA => 1 UCGUCG*GUACU*CGACGA => 1

    By the way, there may be a problem with overlapping sequences. Your initial solution doesn't look like it allows them.

      Thank you for your help
      have a nice day