Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

comparing strings

by Anonymous Monk
on Oct 30, 2002 at 14:53 UTC ( [id://209086]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi. My problem is that I am trying to write a script that compares two strings of DNA of exactly the same length at each position. e.g string1 ACTA string2 AGTC so i want to compare both first letters, both second letters etc.... If the letters at a particular position are different i want to print a '*' for mismatch. I have tried to do this but the following bit of code doesn't work. Any suggestions much appreciated.
# e.g @lines1 contains CACTATGAGTGATCGC and @lines2 contains # ACTGACTAATGCGTTG. foreach $base (@lines1, @lines2) { shift @lines1; shift @lines2; if (@lines1[0] eq @lines2[0]) { last; } else { print "*\n"; } }

Replies are listed 'Best First'.
Re: comparing strings
by davorg (Chancellor) on Oct 30, 2002 at 15:29 UTC
    for (0 .. $#lines1) { print $lines1[$_] eq $lines2[$_] ? $lines1[$_] : '*'; }
    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: comparing strings
by blokhead (Monsignor) on Oct 30, 2002 at 15:30 UTC
    It might be easier for you to use just strings rather than the overhead of arrays for this. Use the substr function to get the nth character in the sequence.
    my $dna1 = 'CACTATGAGTGATCGC'; my $dna2 = 'ACTGACTAATGCGTTG'; print "$dna1\n$dna2\n"; for (0 .. length($dna1)-1) { if (substr($dna1,$_,1) eq substr($dna2,$_,1)) { print ' '; } else { print '*'; } } print "\n"; __END__ # prints this: CACTATGAGTGATCGC ACTGACTAATGCGTTG **** ** * *****
    Also, there are quite a few problems in your original code. First, the value that you want to loop/iterate over is the number N when you are looking at the Nth character. But the foreach loop will set $base to each of the values in @lines1 and then to all the values in @lines2. You cannot get a pair of elements from two different arrays using a foreach loop. Also note that inside the loop, you are comparing the first elements of the array each time. You want to compare a different pair of elements each time through the loop. Lastly, @lines1[0] is invalid (at least until Perl6), and what you need to say instead is $lines1[0].

    HTH and keep trying!

    blokhead

      Thanks blokhead, But I am still having a few problems... I incorporated the snippet you gave me which works fine apart from it doesn't print the correct no. of *'s to correspond to the number of mismatches! This is what i did its slightly different from yours.
      for ($dna_counter = 0; $dna_counter < $length_dna ; $dna_counter++ ) { if (substr(@lines, $_, 1) eq substr (@lines2, $_, 1)) { print ''; } else { print "*"; } } print "\n";
        Of course, if you're using arrays, you just need
        if ($lines{$_] eq $lines2[$_]) {
        The following will replace the non-matching chars in the string  $str1 with "*" (and leave the matching chars) by converting $str2 to a regex:
        $str2 =~ s/./(?:($&)|.)/g; print map{$_||'*'} $str1 =~ /$str2/


          p
        This is what i did its slightly different from yours.
        ... if (substr(@lines, $_, 1) eq substr (@lines2, $_, 1)) ...
        Well, that's actually way different from what Blockhead was doing -- he was using "substr()" on a scalar ($line); substr() is not supposed to be used on an array (@lines).
Re: comparing strings
by termix (Beadle) on Oct 30, 2002 at 15:23 UTC

    I think getting one scalar out of a pair of arrays is going to give you one list (rather than the pair you want to compare). You can try this by adding a print $base line right inside the loop and seeing how many times it loops. Also, unless lines1 is an array of arrays, you want to get your character from it through a scalar like $lines1[0].

    Depending on how good you are with perl you might find the following alternate too simple (or too c-like) but the foreach loop can replace the for loop (I think). And I never liked shifting.

    # e.g @lines1 contains CACTATGAGTGATCGC and @lines2 contains # ACTGACTAATGCGTTG. @lines1= split(//,"CACTATGAGTGATCGC"); @lines2= split(//,"ACTGACTAATGCGTTG"); $line1length=@lines1; $line2length=@lines2; print ("1 :",@lines1,"\n"); print ("2 :",@lines2,"\n"); if ($line1length!=$line2length) { print "Length Mismatch\n"; } else { print "M :"; for ($x=0;$x<$line1length;$x++) { if ($lines1[$x] eq $lines2[$x]) { print $lines1[$x]; } else { print "*"; } } print "\n"; }

    -- termix

(tye)Re: comparing strings
by tye (Sage) on Oct 30, 2002 at 19:14 UTC

    This is what came to my mind:

    my $line1= "CACTATGAGTGATCGC"; my $line2= "ACTGACTAATGCGTTG"; my $diff= $line1 ^ $line2; # Do a bit-wise xor $diff =~ tr/\0-\xff/ */; # Change zero bytes to " ", others +to "*" print "$line1\n$diff\n$line2\n";
    which prints
    CACTATGAGTGATCGCG * ************ GACTGACTAATGCGTTG
    I'm not certain it would be a good fit for you, though.

            - tye (/me shifts his cart into "overgolf")
Re: comparing strings
by artist (Parson) on Oct 30, 2002 at 15:47 UTC
    Hi,
    Try this 'compact' version if the lengths are similar.
    @lines1= split(//,"CACTATGAGTGATCGC"); @lines2= split(//,"ACTGACTAATGCGTTG"); @array = map { $lines1[$i++] eq $_ ? $lines1[$i-1] : '*' } @lines +2; print join "",@array;

    Artist

Re: comparing strings
by Aristotle (Chancellor) on Oct 30, 2002 at 15:28 UTC
    I'll assume your arrays contain a single character per element, though that's unclear from your description.
    while(@lines1 > 0 and @lines2 > 0) { last if shift(@lines1) ne shift(@lines2); } print "*\n" if @lines1 > 0 or @lines2 > 0;

    Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://209086]
Approved by valdez
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-19 19:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found