in reply to Palindrome sequence from file containing mutliple sequences

Here's a slightly shorter version that I think does the same thing, using the regex engine to do all the iteration:

#!/usr/bin/perl # Palindrome sequence http://perlmonks.org/?node_id=1117312 use strict; use warnings; my $contig; # section title my %palhash; my $count = 0; while(<DATA>) { $count++ < 1000 or last; /Contig/ and $contig = $_; /([CAGU]{5,20}) # left part ([CAGU]{0,15}) # middle part ((??{reverse($1)=~tr|CAGU|GUCA|r})) # reverse (right) match (??{$palhash{$contig}{"$1*$2*$3"}++})^/x; # count found, fail } for my $contig ( sort keys %palhash ) { print $contig; for my $key ( sort keys %{ $palhash{$contig} } ) { print "$key => $palhash{$contig}{$key}\n"; } print "\n"; } __DATA__ Contig1 NAAAAGUAUAGGCUCGAGAGAGAAGUCCUGGCCUAUCGGAUUACACCACGNGUCAGAUCU GUCACUUCAAGGAGCUUUUCAGCGUCUUUGACAAGAAUGGCNGACGGUUGCCGUCCUUCC AGGGAGAUUGGGGCAGAAUUCGAGUCACUGCCNGUGGAUUCCUUAGUGAUCCAGAUGUCG GAUAUAUGUCCAAUAUAGCCAAUNGUUGUUGGACAUAGCACCAUCAAUCAUCAAGAUUGC UAUUUCGCCACCUCNAUGUAGAGUGGAAGAUCCAAAUGCGGGAAUGGAUUUCAAGGUUGG UUAAGNAUGGGGGCAGCGGAUGAGUUAACAGUACAGCAAGCUAGGCAGCUCCAUUUNUGA CUUGGAAUCUUCAUACGAUUCAUUUAUGGCAGCUUUGCCUAAUGUUGNGUACUUGAAAGU AAAAUCCAAUAUUUCCAUAUUUUUGUUUAGUGCUUCAAAUCUGGUGAGAACGGAUGUUUU AAGUUGGUAAAACAGUGUAUUCAUUUUGAAGAGUUCUUGGAUUGUUUAAAGCUCAAGAUG CUAUUUGUGUGCUGCUGAUUUGUCGUUCUAUGAGAAAGAAUAUAUGCUUUAAUUUGGUUU UGUAAUAUUAANUAUAUUCAUUCCCCUUGAUUGUUGUUUUGUNNNNNNNNNNNNNNNNNN NN Contig2 NUGUUUUUUCUUUACUCGGUGUCUCGUCGGUACUCGACGACUAAANGGGNUAUAGGAGGG GCGCGAACGNCGAAACGGGAGUGGGUACAAAGCGUGGUCCNGGAUAGACGAGAGCGGACG CGCCGGGUGAAGCUCGGCGCGACGCGAGGCANAGAGGGGCCCGCAGANNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNAAGNCCCAGGAUGCGCAUCGCCCCAGGGGUGAAACCCCCAUCCCAAAGAA UGCUNCUCCUNCGCGGUAGGGCAGCUNCCCGAAGCACCCGACCGCUUUNAGGCCANCCAU AUGAUAAAGNAACGUUGUGUGGUGAAUGGGAUGAAGAUGAUUGAAGNAGAGUAGAGUUUU GCCUCUANAUCUUGAUAUGUAUAUCUUUAAUUAUAUAANUAUAGCUCAUUAUAUUGUGCN UUAGCAUGUAAUAUUUAAGUCUAAAAUUAANUGGACCUCAGCUCGAGGUCGNCAUUCUUU GUUACUUUAGAUCAGAUCUGUANUUCCCUUUGUAUUGUUCAGGNUUUCCAACCAUAAAUU AUUGGUACUAUCUUNAUUGUUAUCAUAAUUGACGUGUGUUUAAGUUCNNNNNNNNNNNNN NNNNN

Here's the output I get:

Contig1 AAAAU*CCAAUAUUUCCAU*AUUUU => 1 AAUAU*UUCC*AUAUU => 1 AUCCA*AAUGCGGGAA*UGGAU => 1 CAAUC*AUCAA*GAUUG => 1 GAAUC*UUCAUAC*GAUUC => 1 GACGG*UUG*CCGUC => 1 GGCAG*AAUUCGAGUCA*CUGCC => 1 UAUAU*GUCCA*AUAUA => 1 UCUUG*GAUUGUUUAAAGCU*CAAGA => 1 UGGAU*UCCUUAGUG*AUCCA => 1 Contig2 ACCUC*AGCUC*GAGGU => 1 AGAUC*A*GAUCU => 1 CGCCG*GGUGAAGCU*CGGCG => 1 CGUCG*GUACU*CGACG => 1 GACCU*CAGCUCG*AGGUC => 1 GACCUC*AGCUC*GAGGUC => 1 GAUAU*GU*AUAUC => 1 GAUGC**GCAUC => 1 GGGGU*GAA*ACCCC => 1 UAUAU*CUUUAAUU*AUAUA => 1 UCGUC*GGUACUC*GACGA => 1 UCGUCG*GUACU*CGACGA => 1

By the way, there may be a problem with overlapping sequences. Your initial solution doesn't look like it allows them.

Replies are listed 'Best First'.
Re^2: Palindrome sequence from file containing mutliple sequences
by reciter (Novice) on Feb 21, 2015 at 04:52 UTC
    Thank you for your help
    have a nice day