GSperlbio has asked for the wisdom of the Perl Monks concerning the following question:

I have two sequences stored in two arrays as shown below: @seq1 = "ATGC TGCT GCTA CTAA TAAC" @seq2 = "GTCA" i want to compare string in @seq2 with each and every words in @seq1 and need to print the position and no of mismatches at that position. For example in the first index while comparing ATGC with GTCA there are 3 mismatches, and in the second index while comparing TGCT with GTCA there are 3 mismatches and so on. It would be grateful if anyone show me the code to do the above process. Thank you..

Replies are listed 'Best First'.
Re: perl: comparing words in the arrays
by choroba (Cardinal) on Jul 28, 2015 at 13:59 UTC
    It would be better if @seq1 was an array of words, not an array with just a single element.

    You can use an old trick: XOR the two strings and count the number of non-null characters:

    #!/usr/bin/perl use strict; use warnings; my @seq1 = qw(ATGC TGCT GCTA CTAA TAAC); my $seq2 = 'GTCA'; for my $seq (@seq1) { my $count = ($seq ^ $seq2) =~ tr/\0//c; print "$seq $count\n"; }
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: perl: comparing words in the arrays
by johngg (Canon) on Jul 28, 2015 at 14:16 UTC

    choroba has provided a solution giving the number of mismatches but if you also want their positions (as I think you say) then a regex and pos might help. This code gives 1-based positions, just subtract 1 in the regex code block if you need 0-based.

    $ perl -Mstrict -Mwarnings -E ' my @seqs = qw{ ATGC TGCT GCTA CTAA TAAC }; my $match = q{GTCA}; foreach my $seq ( @seqs ) { my @posns; my $diff = $seq ^ $match; my $nMismatches = () = $diff =~ m{([^\0])(?{ push @posns, pos $dif +f })}g; say qq{$seq : $match -> $nMismatches at @posns}; }' ATGC : GTCA -> 3 at 1 3 4 TGCT : GTCA -> 3 at 1 2 4 GCTA : GTCA -> 2 at 2 3 CTAA : GTCA -> 2 at 1 3 TAAC : GTCA -> 4 at 1 2 3 4 $

    I hope this is helpful.

    Cheers,

    JohnGG

Re: perl: comparing words in the arrays
by thanos1983 (Parson) on Jul 28, 2015 at 13:42 UTC

    Hello GSperlbio

    I think you will find anything you need here Compare two arrays

    Update: I was reading again your question and I got stack at the point where you say you want to compare every single word. The arrays that you provide us as an example contain same letters in different sequences. So I want to ask you if you mean every "letter" or every "word" to compare. The solution is completely different after.

    Update 2: Maybe something like this?

    #!/usr/bin/perl use strict; use warnings; my @seq1 = qw(ATGC TGCT GCTA CTAA GTCA TAAC ); my @seq2 = qw(GTCA TGCT); my $i = 0; foreach my $array1Element (@seq1) { foreach my $array2Element (@seq2) { if ($array1Element eq $array2Element) { print "Matched: " . $array1Element . " at possition " . $i . " + of \@seq1.\n"; } } $i++; } __END__ Matched: TGCT at possition 1 of @seq1. Matched: GTCA at possition 4 of @seq1.

    Update 3: More complicated than proposed solutions but I could not resist to play around. Maybe you like it:

    Update 4: Sorry I was not printing mismatched I was printing matched. Modified the code also in sorting order. This looks better.

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my @seq1 = qw(ATGC TGCT GCTA CTAA TAAC ); my @seq2 = qw(GTCA); my @AoHSeq1 = AoH(@seq1); my @charactersSeq2 = splitStringToCharacters(@seq2); my %seq2Hash = ArrayToHash(@charactersSeq2); my (@result,@found) = (); my $line; foreach my $arrayHashNum ( 0 .. $#AoHSeq1 ) { foreach my $key ( sort {$a<=>$b} keys %{ $AoHSeq1[$arrayHashNum] } + ) { foreach my $character (values %seq2Hash) { if ($character ne $AoHSeq1[$arrayHashNum]{$key}) { $line = "On element: $arrayHashNum from \@seq1, character ".$c +haracter." missmatched ".$AoHSeq1[$arrayHashNum]{$key}." form \@seq2 +at position $key"; push @found, $line; } } } push @result, @found; } print Dumper \@result; sub splitStringToCharacters { my @characters = split(//,$_[0]); return @characters; } sub ArrayToHash { my $i = 0; my %hash = map { $i++ => $_ } @_; return %hash; } sub AoH { my @characters = (); my @AoHCharacters = (); my %charactersSeq = (); foreach my $element (@_) { @characters = splitStringToCharacters($element); %charactersSeq = ArrayToHash(@characters); push @AoHCharacters , {%charactersSeq}; } return @AoHCharacters; } __END__ $VAR1 = [ 'On element: 0 from @seq1, character C missmatched A form @s +eq2 at position 0', 'On element: 0 from @seq1, character T missmatched A form @s +eq2 at position 0', 'On element: 0 from @seq1, character G missmatched A form @s +eq2 at position 0', 'On element: 0 from @seq1, character C missmatched T form @s +eq2 at position 1', 'On element: 0 from @seq1, character G missmatched T form @s +eq2 at position 1', 'On element: 0 from @seq1, character A missmatched T form @s +eq2 at position 1', 'On element: 0 from @seq1, character C missmatched G form @s +eq2 at position 2', 'On element: 0 from @seq1, character T missmatched G form @s +eq2 at position 2', 'On element: 0 from @seq1, character A missmatched G form @s +eq2 at position 2', 'On element: 0 from @seq1, character T missmatched C form @s +eq2 at position 3', 'On element: 0 from @seq1, character G missmatched C form @s +eq2 at position 3', 'On element: 0 from @seq1, character A missmatched C form @s +eq2 at position 3', 'On element: 0 from @seq1, character C missmatched A form @s +eq2 at position 0', 'On element: 0 from @seq1, character T missmatched A form @s +eq2 at position 0', 'On element: 0 from @seq1, character G missmatched A form @s +eq2 at position 0', 'On element: 0 from @seq1, character C missmatched T form @s +eq2 at position 1', 'On element: 0 from @seq1, character G missmatched T form @s +eq2 at position 1', 'On element: 0 from @seq1, character A missmatched T form @s +eq2 at position 1', 'On element: 0 from @seq1, character C missmatched G form @s +eq2 at position 2', 'On element: 0 from @seq1, character T missmatched G form @s +eq2 at position 2', 'On element: 0 from @seq1, character A missmatched G form @s +eq2 at position 2', 'On element: 0 from @seq1, character T missmatched C form @s +eq2 at position 3', 'On element: 0 from @seq1, character G missmatched C form @s +eq2 at position 3', 'On element: 0 from @seq1, character A missmatched C form @s +eq2 at position 3', 'On element: 1 from @seq1, character C missmatched T form @s +eq2 at position 0', 'On element: 1 from @seq1, character G missmatched T form @s +eq2 at position 0', 'On element: 1 from @seq1, character A missmatched T form @s +eq2 at position 0', 'On element: 1 from @seq1, character C missmatched G form @s +eq2 at position 1', 'On element: 1 from @seq1, character T missmatched G form @s +eq2 at position 1', 'On element: 1 from @seq1, character A missmatched G form @s +eq2 at position 1', 'On element: 1 from @seq1, character T missmatched C form @s +eq2 at position 2', 'On element: 1 from @seq1, character G missmatched C form @s +eq2 at position 2', 'On element: 1 from @seq1, character A missmatched C form @s +eq2 at position 2', 'On element: 1 from @seq1, character C missmatched T form @s +eq2 at position 3', 'On element: 1 from @seq1, character G missmatched T form @s +eq2 at position 3', 'On element: 1 from @seq1, character A missmatched T form @s +eq2 at position 3', 'On element: 0 from @seq1, character C missmatched A form @s +eq2 at position 0', 'On element: 0 from @seq1, character T missmatched A form @s +eq2 at position 0', 'On element: 0 from @seq1, character G missmatched A form @s +eq2 at position 0', 'On element: 0 from @seq1, character C missmatched T form @s +eq2 at position 1', 'On element: 0 from @seq1, character G missmatched T form @s +eq2 at position 1', 'On element: 0 from @seq1, character A missmatched T form @s +eq2 at position 1', 'On element: 0 from @seq1, character C missmatched G form @s +eq2 at position 2', 'On element: 0 from @seq1, character T missmatched G form @s +eq2 at position 2', 'On element: 0 from @seq1, character A missmatched G form @s +eq2 at position 2', 'On element: 0 from @seq1, character T missmatched C form @s +eq2 at position 3', 'On element: 0 from @seq1, character G missmatched C form @s +eq2 at position 3', 'On element: 0 from @seq1, character A missmatched C form @s +eq2 at position 3', 'On element: 1 from @seq1, character C missmatched T form @s +eq2 at position 0', 'On element: 1 from @seq1, character G missmatched T form @s +eq2 at position 0', 'On element: 1 from @seq1, character A missmatched T form @s +eq2 at position 0', 'On element: 1 from @seq1, character C missmatched G form @s +eq2 at position 1', 'On element: 1 from @seq1, character T missmatched G form @s +eq2 at position 1', 'On element: 1 from @seq1, character A missmatched G form @s +eq2 at position 1', 'On element: 1 from @seq1, character T missmatched C form @s +eq2 at position 2', 'On element: 1 from @seq1, character G missmatched C form @s +eq2 at position 2', 'On element: 1 from @seq1, character A missmatched C form @s +eq2 at position 2', 'On element: 1 from @seq1, character C missmatched T form @s +eq2 at position 3', 'On element: 1 from @seq1, character G missmatched T form @s +eq2 at position 3', 'On element: 1 from @seq1, character A missmatched T form @s +eq2 at position 3', 'On element: 2 from @seq1, character C missmatched G form @s +eq2 at position 0', 'On element: 2 from @seq1, character T missmatched G form @s +eq2 at position 0', 'On element: 2 from @seq1, character A missmatched G form @s +eq2 at position 0', 'On element: 2 from @seq1, character T missmatched C form @s +eq2 at position 1', 'On element: 2 from @seq1, character G missmatched C form @s +eq2 at position 1', 'On element: 2 from @seq1, character A missmatched C form @s +eq2 at position 1', 'On element: 2 from @seq1, character C missmatched T form @s +eq2 at position 2', 'On element: 2 from @seq1, character G missmatched T form @s +eq2 at position 2', 'On element: 2 from @seq1, character A missmatched T form @s +eq2 at position 2', 'On element: 2 from @seq1, character C missmatched A form @s +eq2 at position 3', 'On element: 2 from @seq1, character T missmatched A form @s +eq2 at position 3', 'On element: 2 from @seq1, character G missmatched A form @s +eq2 at position 3', 'On element: 0 from @seq1, character C missmatched A form @s +eq2 at position 0', 'On element: 0 from @seq1, character T missmatched A form @s +eq2 at position 0', 'On element: 0 from @seq1, character G missmatched A form @s +eq2 at position 0', 'On element: 0 from @seq1, character C missmatched T form @s +eq2 at position 1', 'On element: 0 from @seq1, character G missmatched T form @s +eq2 at position 1', 'On element: 0 from @seq1, character A missmatched T form @s +eq2 at position 1', 'On element: 0 from @seq1, character C missmatched G form @s +eq2 at position 2', 'On element: 0 from @seq1, character T missmatched G form @s +eq2 at position 2', 'On element: 0 from @seq1, character A missmatched G form @s +eq2 at position 2', 'On element: 0 from @seq1, character T missmatched C form @s +eq2 at position 3', 'On element: 0 from @seq1, character G missmatched C form @s +eq2 at position 3', 'On element: 0 from @seq1, character A missmatched C form @s +eq2 at position 3', 'On element: 1 from @seq1, character C missmatched T form @s +eq2 at position 0', 'On element: 1 from @seq1, character G missmatched T form @s +eq2 at position 0', 'On element: 1 from @seq1, character A missmatched T form @s +eq2 at position 0', 'On element: 1 from @seq1, character C missmatched G form @s +eq2 at position 1', 'On element: 1 from @seq1, character T missmatched G form @s +eq2 at position 1', 'On element: 1 from @seq1, character A missmatched G form @s +eq2 at position 1', 'On element: 1 from @seq1, character T missmatched C form @s +eq2 at position 2', 'On element: 1 from @seq1, character G missmatched C form @s +eq2 at position 2', 'On element: 1 from @seq1, character A missmatched C form @s +eq2 at position 2', 'On element: 1 from @seq1, character C missmatched T form @s +eq2 at position 3', 'On element: 1 from @seq1, character G missmatched T form @s +eq2 at position 3', 'On element: 1 from @seq1, character A missmatched T form @s +eq2 at position 3', 'On element: 2 from @seq1, character C missmatched G form @s +eq2 at position 0', 'On element: 2 from @seq1, character T missmatched G form @s +eq2 at position 0', 'On element: 2 from @seq1, character A missmatched G form @s +eq2 at position 0', 'On element: 2 from @seq1, character T missmatched C form @s +eq2 at position 1', 'On element: 2 from @seq1, character G missmatched C form @s +eq2 at position 1', 'On element: 2 from @seq1, character A missmatched C form @s +eq2 at position 1', 'On element: 2 from @seq1, character C missmatched T form @s +eq2 at position 2', 'On element: 2 from @seq1, character G missmatched T form @s +eq2 at position 2', 'On element: 2 from @seq1, character A missmatched T form @s +eq2 at position 2', 'On element: 2 from @seq1, character C missmatched A form @s +eq2 at position 3', 'On element: 2 from @seq1, character T missmatched A form @s +eq2 at position 3', 'On element: 2 from @seq1, character G missmatched A form @s +eq2 at position 3', 'On element: 3 from @seq1, character T missmatched C form @s +eq2 at position 0', 'On element: 3 from @seq1, character G missmatched C form @s +eq2 at position 0', 'On element: 3 from @seq1, character A missmatched C form @s +eq2 at position 0', 'On element: 3 from @seq1, character C missmatched T form @s +eq2 at position 1', 'On element: 3 from @seq1, character G missmatched T form @s +eq2 at position 1', 'On element: 3 from @seq1, character A missmatched T form @s +eq2 at position 1', 'On element: 3 from @seq1, character C missmatched A form @s +eq2 at position 2', 'On element: 3 from @seq1, character T missmatched A form @s +eq2 at position 2', 'On element: 3 from @seq1, character G missmatched A form @s +eq2 at position 2', 'On element: 3 from @seq1, character C missmatched A form @s +eq2 at position 3', 'On element: 3 from @seq1, character T missmatched A form @s +eq2 at position 3', 'On element: 3 from @seq1, character G missmatched A form @s +eq2 at position 3', 'On element: 0 from @seq1, character C missmatched A form @s +eq2 at position 0', 'On element: 0 from @seq1, character T missmatched A form @s +eq2 at position 0', 'On element: 0 from @seq1, character G missmatched A form @s +eq2 at position 0', 'On element: 0 from @seq1, character C missmatched T form @s +eq2 at position 1', 'On element: 0 from @seq1, character G missmatched T form @s +eq2 at position 1', 'On element: 0 from @seq1, character A missmatched T form @s +eq2 at position 1', 'On element: 0 from @seq1, character C missmatched G form @s +eq2 at position 2', 'On element: 0 from @seq1, character T missmatched G form @s +eq2 at position 2', 'On element: 0 from @seq1, character A missmatched G form @s +eq2 at position 2', 'On element: 0 from @seq1, character T missmatched C form @s +eq2 at position 3', 'On element: 0 from @seq1, character G missmatched C form @s +eq2 at position 3', 'On element: 0 from @seq1, character A missmatched C form @s +eq2 at position 3', 'On element: 1 from @seq1, character C missmatched T form @s +eq2 at position 0', 'On element: 1 from @seq1, character G missmatched T form @s +eq2 at position 0', 'On element: 1 from @seq1, character A missmatched T form @s +eq2 at position 0', 'On element: 1 from @seq1, character C missmatched G form @s +eq2 at position 1', 'On element: 1 from @seq1, character T missmatched G form @s +eq2 at position 1', 'On element: 1 from @seq1, character A missmatched G form @s +eq2 at position 1', 'On element: 1 from @seq1, character T missmatched C form @s +eq2 at position 2', 'On element: 1 from @seq1, character G missmatched C form @s +eq2 at position 2', 'On element: 1 from @seq1, character A missmatched C form @s +eq2 at position 2', 'On element: 1 from @seq1, character C missmatched T form @s +eq2 at position 3', 'On element: 1 from @seq1, character G missmatched T form @s +eq2 at position 3', 'On element: 1 from @seq1, character A missmatched T form @s +eq2 at position 3', 'On element: 2 from @seq1, character C missmatched G form @s +eq2 at position 0', 'On element: 2 from @seq1, character T missmatched G form @s +eq2 at position 0', 'On element: 2 from @seq1, character A missmatched G form @s +eq2 at position 0', 'On element: 2 from @seq1, character T missmatched C form @s +eq2 at position 1', 'On element: 2 from @seq1, character G missmatched C form @s +eq2 at position 1', 'On element: 2 from @seq1, character A missmatched C form @s +eq2 at position 1', 'On element: 2 from @seq1, character C missmatched T form @s +eq2 at position 2', 'On element: 2 from @seq1, character G missmatched T form @s +eq2 at position 2', 'On element: 2 from @seq1, character A missmatched T form @s +eq2 at position 2', 'On element: 2 from @seq1, character C missmatched A form @s +eq2 at position 3', 'On element: 2 from @seq1, character T missmatched A form @s +eq2 at position 3', 'On element: 2 from @seq1, character G missmatched A form @s +eq2 at position 3', 'On element: 3 from @seq1, character T missmatched C form @s +eq2 at position 0', 'On element: 3 from @seq1, character G missmatched C form @s +eq2 at position 0', 'On element: 3 from @seq1, character A missmatched C form @s +eq2 at position 0', 'On element: 3 from @seq1, character C missmatched T form @s +eq2 at position 1', 'On element: 3 from @seq1, character G missmatched T form @s +eq2 at position 1', 'On element: 3 from @seq1, character A missmatched T form @s +eq2 at position 1', 'On element: 3 from @seq1, character C missmatched A form @s +eq2 at position 2', 'On element: 3 from @seq1, character T missmatched A form @s +eq2 at position 2', 'On element: 3 from @seq1, character G missmatched A form @s +eq2 at position 2', 'On element: 3 from @seq1, character C missmatched A form @s +eq2 at position 3', 'On element: 3 from @seq1, character T missmatched A form @s +eq2 at position 3', 'On element: 3 from @seq1, character G missmatched A form @s +eq2 at position 3', 'On element: 4 from @seq1, character C missmatched T form @s +eq2 at position 0', 'On element: 4 from @seq1, character G missmatched T form @s +eq2 at position 0', 'On element: 4 from @seq1, character A missmatched T form @s +eq2 at position 0', 'On element: 4 from @seq1, character C missmatched A form @s +eq2 at position 1', 'On element: 4 from @seq1, character T missmatched A form @s +eq2 at position 1', 'On element: 4 from @seq1, character G missmatched A form @s +eq2 at position 1', 'On element: 4 from @seq1, character C missmatched A form @s +eq2 at position 2', 'On element: 4 from @seq1, character T missmatched A form @s +eq2 at position 2', 'On element: 4 from @seq1, character G missmatched A form @s +eq2 at position 2', 'On element: 4 from @seq1, character T missmatched C form @s +eq2 at position 3', 'On element: 4 from @seq1, character G missmatched C form @s +eq2 at position 3', 'On element: 4 from @seq1, character A missmatched C form @s +eq2 at position 3' ];

    Hope this helps.

    Seeking for Perl wisdom...on the process of learning...not there...yet!
      i have asked to compare every single words in @seq1 with $seq2 and to print the index position of each words and corresponding mismatches in that position