dmunoze has asked for the wisdom of the Perl Monks concerning the following question:

Hello! I hope my problem is more complicated than the title suggests. I need to compare a series of 100 sequences with the original and the program should give me one number of identities. For example:

Original sequence: ATCGGGACG Stream 0: TCGTCAGCG Sequence 1: ATGCGAAAA

Then the program should print:

The sequence 0 has 2 identities The sequence 1 has 4 identities

My code requires a file containing the original sequence (you will note this at the biginning, where it says: "original.FASTA") so I don't know if you can work without it. Thanks already.

#!/usr/bin/perl -w #Lectura de archivo de secuencias $secuencia = 'original.FASTA'; #Abrir el archivo open (SECUENCIA, $secuencia); #Leer la secuencia @secuencia = <SECUENCIA>; close SECUENCIA; #Eliminar la línea de título. $secuencia[0]=""; #print "Esta es la secuencia:\n\n"; #print @secuencia; #Convertir el arreglo a una variable, con el comando join. $DNA = join('',@secuencia); #Concatenar las líneas: chomp @secuencia; $DNA =~ s/\n//g; #Quita los saltos de línea. $DNA =~ s/\s//g; print "La secuencia original es: \n\n",$DNA,"\n\n"; #Secuencias aleatorias $test="ACGT"; @DNA=split("",$test); for ($j=0; $j < 100; ++$j){ $salida=""; for ($i=0; $i < 541; ++$i){ $random=rand(length($test)); $salida=$salida.$DNA[$random]; } print ">Secuencia_$j\n$salida\n"; } exit;

Replies are listed 'Best First'.
Re: Compare two variables
by Cristoforo (Curate) on May 19, 2014 at 22:07 UTC
    This will count the matching characters.
    my $orig = <DATA>; my $i = 0; while (<DATA>) { my $matches = $orig ^ $_; printf "Sequence %d has %d identities\n", $i++, $matches =~ tr/\0/ +/; } __DATA__ ATCGGGACG TCGTCAGCG ATGCGAAAA
    Prints:
    Sequence 0 has 2 identities Sequence 1 has 4 identities

      Thanks for the answer, but I didn't understand the line with DATA. I mean, what exactly DATA contains?

        Here's essentially the same thing without __DATA__:

        c:\@Work\Perl\monks>perl -wMstrict -le "my $seq = 'ATCGGGACG'; my $n = 0; ;; for my $s (qw(TCGTCAGCG ATGCGAAAA TTTTTTTTT TATTTTTTT)) { print qq{\n'$seq' original}; my $idents = (my $same = $s ^ $seq) =~ tr/\x00//; $same =~ tr/\x00-\xff/^ /; print qq{'$s' seq }, $n++; print qq{ $same $idents identities}; } " 'ATCGGGACG' original 'TCGTCAGCG' seq 0 ^^ 2 identities 'ATCGGGACG' original 'ATGCGAAAA' seq 1 ^^ ^ ^ 4 identities 'ATCGGGACG' original 'TTTTTTTTT' seq 2 ^ 1 identities 'ATCGGGACG' original 'TATTTTTTT' seq 3 0 identities
Re: Compare two variables
by GotToBTru (Prior) on May 19, 2014 at 21:41 UTC

    Simple way to count matching characters in two strings (assuming they are same size)

    $l = length($string1); while ($l--) { $i += 1 if (substr($string1, $l, 1) eq substr($string2, $l, 1)) } print "$i identities.\n";
    1 Peter 4:10