http://qs1969.pair.com?node_id=522402

ktsirig has asked for the wisdom of the Perl Monks concerning the following question:

Hi all! I have this assignment to give in Biology class and I am stuck. Assignment: I am given a sequence of letters, say:
XXXXXXABCDXXXXXXXX
and I am interested in part ABCD, which represents letters #7-#10 as you can see. I am then given the same sequence, which now contains characters like * and !(only these 2 are allowed), say:
XX**XXX!!!X**AB*!C*DXX*!!!XXX**XXX
and I want to find out which letters now represent the part ABCD. If you count, you see that ABCD is now letters #14-#20 [4*and! were added prior to A and 10 prior to D
What I think must be done is:
1) check how many (if any) * or/and ! were added prior to start letter A(#7)
2) check how many (if any) * or/and ! were added prior to end letter D(#10)
3) add all * and/or ! to starting and ending letter of part ABCD
Has anyone got any hints to give me as to which functions of Perl will be useful for this problem?

Replies are listed 'Best First'.
Re: confused with strings
by abcde (Scribe) on Jan 11, 2006 at 11:12 UTC

    I thought of the function index first, which gets the position of a string (or character, in this case).

    my $seq = "XX**XXX!!!X**AB*!C*DXX*!!!XXX**XXX"; # The code presumes that it's a correct sequence. # If you're getting the sequence from somewhere else you # should check it to make sure it contains the right letters. print "#" . index($seq, "A") . "-#" . index($seq, "D") . "\n";

    This is the easiest way; your method works too, but it would involve splitting up the string and operating on that, so using index is easier.

Re: confused with strings
by borisz (Canon) on Jan 11, 2006 at 10:36 UTC
    $_ = 'XX**XXX!!!X**AB*!C*DXX*!!!XXX**XXX'; /A[!\*]*B[!\*]*C[!\*]*D/g and print '#', pos() + 1 - length($&), '-#', + pos(); __OUTPUT__ #14-#20
    Boris
Re: confused with strings
by GrandFather (Saint) on Jan 11, 2006 at 11:40 UTC

    Are you given ABCD or are you given the position and length (or start and end) of the substring? Can the * and !'s occur anywhere? You should give a before and after sample. Something like this:

    __DATA__ XXXXXXABCDXXXXXXXX XX**XXX!!!X**AB*!C*DXX*!!!XXX**XXX

    Want to print:

    10 characters added XXXXXXABCD**!!!***!*XX*!!!XXX**XXX

    The following code may be a good starting point:

    use strict; use warnings; my $match = 'ABCD'; while (<DATA>) { my $org = $_; defined (my $mutated = <DATA>) or die "Missing edited line"; my $segment = substr $mutated, 0, index ($mutated, substr $match, 3, + 1) + 1; my $suffix = substr $mutated, index ($mutated, substr $match, 3, 1) ++ 1; (my $pInsert = $segment) =~ tr/*!//cd; (my $pSegment = $segment) =~ tr/*!//d; print length ($pInsert) . " characters added\n"; print "$pSegment$pInsert$suffix\n"; } __DATA__ XXXXXXABCDXXXXXXXX XX**XXX!!!X**AB*!C*DXX*!!!XXX**XXX

    Note that some error checking should be added and that there are a few assumptions about the match string and what the X sequences can actually contain.


    DWIM is Perl's answer to Gödel
Re: confused with strings
by Perl Mouse (Chaplain) on Jan 11, 2006 at 11:03 UTC
    If there is just one A and one D, you can simply use the index function to find the indices of the characters in the string.
    Perl --((8:>*