reebee3 has asked for the wisdom of the Perl Monks concerning the following question:

Hello! I am trying to write a script that detects SNPs and their position in DNA sequences. Simply put, A SNP is when there is a change in A, T, C, or G between sequences, therefore...

A could change to T, C, to G

T could change to C, G, or A

G could change to C, T or A

C could change to G, T or A

For Example:

Sequence 1: ATGGAT

Sequence 2: ACGGAG

There are 2 SNPs here.

SNP 1 is the T --> C is position 2 of the sequence. SNP 2 is the T -->G in position 6 of the sequence.

Criteria for the script:

Only 2 sequences can be entered on the command line The 2 sequences must be equal length Find the SNP and the position of the SNP (starting from position 1 not 0) Print out data- for example using the sample above

Pos 2: T => C

Pos 6: T =>G

Found 2 SNPs.

This is what I have so far, but I am not sure how to proceed...

#!/usr/bin/env perl # file: 08-snps.pl use strict; use warnings; use autodie; my $number_seqs = scalar @ARGV; if($number_seqs != 2){ print "Please provide two sequences \n"; } for (my $i=0; $i<$number_seqs; $i++){ if(length($ARGV[$i]) != length($ARGV[0])){ print "Please ensure the sequences are the same length \n"; } } my $var = $ARGV[0]; print $var,"\n"; my $var2 = my $var[0]; print $var2,"\n";

Replies are listed 'Best First'.
Re: Help tracking changes in array
by toolic (Bishop) on Oct 08, 2015 at 18:13 UTC
    Doesn't BioPerl already do this stuff? Regardless, you should die if the user doesn't give you what you want. Then, you can split each seq into characters and loop over them:
    use warnings; use strict; die "provide two sequences \n" unless @ARGV == 2; die "ensure the sequences are the same length \n" unless length($ARGV[ +0]) == length($ARGV[1]); my @s1 = split //, $ARGV[0]; my @s2 = split //, $ARGV[1]; my $cnt = 0; for my $i (0 .. $#s1) { if ($s1[$i] ne $s2[$i]) { printf "pos %d: %s => %s\n", $i+1, $s1[$i], $s2[$i]; $cnt++; } } print "snps = $cnt\n";
Re: Help tracking changes in array
by kcott (Archbishop) on Oct 08, 2015 at 18:35 UTC

    G'day reebee3,

    Welcome to the Monastery.

    I'd use substr to get the individual bases. Then a simple comparison will identify your SNPs.

    #!/usr/bin/env perl -l use strict; use warnings; my ($seq1, $seq2) = qw{ATGGAT ACGGAG}; my $snp = 0; for my $pos (1 .. length $seq1) { my $base1 = substr $seq1, $pos - 1, 1; my $base2 = substr $seq2, $pos - 1, 1; next if $base1 eq $base2; ++$snp; print "SNP $snp is the $base1 --> $base2 at position $pos"; }

    Output:

    SNP 1 is the T --> C at position 2 SNP 2 is the T --> G at position 6

    — Ken

Re: Help tracking changes in array
by stevieb (Canon) on Oct 08, 2015 at 18:14 UTC

    Welcome to the Monastery, reebee3!

    First, your title is a bit misleading, because you're not really trying to find changes in an array, you're trying to find the position of changes between two strings.

    With that said, in the example below, I transform the strings into lists (similar to arrays, but not the same) using split, and compare one letter at a time in the loop. $_ contains the number for each iteration in the for loop (which starts at zero) and add one to it to create $pos.

    use strict; use warnings; use autodie; my $number_seqs = scalar @ARGV; if($number_seqs != 2){ print "Please provide two sequences \n"; } if (length $ARGV[0] != length $ARGV[1]){ print "Please ensure the sequences are the same length \n"; exit; } for (0..(length $ARGV[0]) - 1){ my $first = (split //, $ARGV[0])[$_]; my $second = (split //, $ARGV[1])[$_]; if ($first ne $second){ my $pos = $_++; print "pos $pos: $first => $second\n"; } }

    I'm pretty sure there are much more efficient ways to do this, but I'm drawing a blank right now.

Re: Help tracking changes in array
by Anonymous Monk on Oct 08, 2015 at 18:42 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1144211 use strict; use warnings; @ARGV or @ARGV = qw( ATGGAT ACGGAG ); @ARGV == 2 or die "usage: $0 seq seq\n"; my ($left, $right) = @ARGV; $_ = $left ^ $right; print "Pos ", $+[0], ": ", substr($left, $-[0], 1), ' => ', substr($right, $-[0], 1), "\n" while /[^\0]/g;