string complement question

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: string complement question by Limbic~Region (Chancellor) on Nov 21, 2003 at 13:45 UTC
Anonymous Monk, Here are the assumptions I made since your original post is unclear: You want to compare each element of an array against every other element in the array You only want to reverse one of the two elements being matched Elements will all be uppercase, contain only the letters T A C G, and will be the same length You do not want to repeat matches #!/usr/bin/perl -w use strict; my @dna = qw(GTTC TTTT GGGG CCCC GAAC); my %seen; for my $index_1 ( 0 .. $#dna ) { next if $seen{$index_1}; $seen{$index_1} = 1; my $dna1 = reverse $dna[ $index_1 ]; for my $index_2 ( 0 .. $#dna ) { next if $seen{$index_2}; my $dna2 = $dna[ $index_2 ]; if ( match($dna1 , $dna2) ) { $seen{$index_2} = 1; print join " : " , $dna1, $index_1, $dna2, $index_2; print $/; } } } sub match { my @dna = map { [ split // ] } @_; my %val = ( A => -1, T => 1, G => -2, C => 2 ); for ( 0 .. $#{$dna[0]} ) { return 0 if $val{ $dna[0]->[$_] } + $val{ $dna[1]->[$_] }; } return 1; } [download] Cheers - L~R	[reply] [d/l]
Re: string complement question by Abigail-II (Bishop) on Nov 21, 2003 at 12:21 UTC
Well, $dna is the reversed complement of `$array1[$i]`. But you are comparing it to itself. Now there will be strings that will match this way (AATT for instance), but you say you want GTTC to match GAAC. That's two things you want to compare. But you are comparing one thing with itself. I'm not sure what you where you want to compare your dna fragments with. Abigail	[reply] [d/l]
Re: string complement question by Art_XIV (Hermit) on Nov 21, 2003 at 13:49 UTC
Will this help? `use strict; my %sequences; while (<DATA>) { chomp; $sequences{$_} = $.; } for my $seq (keys %sequences) { my $rev_compl = get_rev_compl($seq); print "$seq matches $rev_compl on line $sequences{$rev_compl}\n" if exists $sequences{$rev_compl}; } sub get_rev_compl { my ($seq) = @_; $seq =~ tr/GCAT/CGTA/; return reverse $seq; } 1; __DATA__ GATTACA TTAGCAA CATCATC TTGCTAA TGTAATC GATGATG` [download] I put the sequences into a hash instead of an array to (hopefully) increase the lookup speed. This won't work if the sequences aren't unique, though. Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"	[reply] [d/l]
Re: Re: string complement question by thospel (Hermit) on Nov 21, 2003 at 16:44 UTC
Easily fixed by remembering all indices, e.g. like this: `#!/usr/bin/perl -w use strict; chomp(my @sequences = <DATA>); my (%sequences, $i); push @{$sequences{$_}}, $i++ for @sequences; for (keys %sequences) { my $complement = reverse; $complement =~ tr/GCAT/CGTA/; print "$_ complements sequence nr @{$sequences{$complement}}\n" if $sequences{$complement}; } __DATA__ GATTACA GATTACA TTAGCAA CATCATC TTGCTAA TGTAATC GATGATG` [download]	[reply] [d/l]
Re: string complement question by b10m (Vicar) on Nov 21, 2003 at 12:28 UTC
This might be a horribly slow sollution, but I'll give it anyway, and have the real monks take a shot at ;) `my $i; my $j; for ($i=0; $i < @array1; $i++) { $dna = reverse $array1[$i]; $dna =~ tr/actg/tgac/; for ($j=0; $j < @array1; $j++) { if ($dna eq $array1[$j]) { print "FOUND $dna\t$array[$j]\n"; } } }` [download] Hope this is what you mean? -- B10m	[reply] [d/l]
Re: Re: string complement question by Anonymous Monk on Nov 21, 2003 at 13:10 UTC
thanks B10m, but this is still just getting the exact match! oh well...	[reply]
Re: string complement question by b10m (Vicar) on Nov 21, 2003 at 13:23 UTC
Mmmm, my bad. I wasn't paying much attention (apparantly). I was still checking and printing the same stuff, which didn't really make sense. This should do the trick (still slow, and I'm looking forward to faster ways :) `my $i; my $j; my $dna; my $temp_dna; for ($i=0; $i < @array1; $i++) { $dna = $array1[$i]; $temp_dna = $dna; $dna = reverse $dna; $dna =~ tr/actg/tgac/; for ($j=0; $j < @array1; $j++) { if ($dna eq $array1[$j]) { print "FOUND $temp_dna\t$array[$j]\n"; } } }` [download] Please note that I made a lot of steps to get to this. This could be shortened, but this is more readable (for me at least ;) -- B10m	[reply] [d/l]
Re: string complement question by Abigail-II (Bishop) on Nov 21, 2003 at 13:55 UTC
Re: string complement question by duff (Parson) on Nov 21, 2003 at 16:17 UTC
From what you've described, your code does what you want. I.e., the string "gttcgaac" matches because if you compliment it and reverse it you get "gttcgaac" which means that the original string is its own reversed compliment. gttcgaac caagcttg Now, if you want to find other occurences of the reversed compliment within the array, you'll have to cache all of the reversed compliments somewhere and then do equality comparisons against each element of the array. Or did you mean something different? duff	[reply]