You are actually very close on this one ... While there are a few errors in the code snippet provided, you are on the right track with the use of substr function.

The key to the solution for this problem (as far as my understanding of the problem goes) is to make use of the fourth argument of the substr function on scalar strings which allows for replacement strings to be inserted - This would allow replacement strings in the DNA strands to be inserted as follows:

substr( $dna, $start + 1, $finish - $start, "X" x ( $finish - $start ) + );

The major problem which you appear to be facing with the existing code is that the $finish is not being defined at any point in your code. Note that the variables @finish and $finish are not equivalent and that scoping issues also come into effect with the code provided - It would be worthwhile your reading of Coping with Scoping written by Dominus.

The following is how I would rewrite the code snippet provided - Note that the following is untested and is provided for illustrative purposes:

my $dna; open DNA, "<dna.file" or die $!; foreach (<DNA>) { chomp; $dna .= $_; } close DNA; open POS, "<positions.input" or die $!; foreach (<POS>) { chomp; my ($start, $finish) = split /\s+/; substr( $dna, $start + 1, $finish - $start, "X" x ( $finish - $sta +rt ) ); print $dna, "\n"; } close POS;

This code snippet differs slightly from that provided in a few ways - Firstly, the manner of populating the $dna string differs in that the above example code populates the DNA scalar string by reading each line of the supplied file, chomping new-line characters and concatenating the input onto the $dna string rather than slurping it into an array first. This code then steps through the positions input file and splices DNA scalar accordingly, replacing introns/extrons with 'X'-strings of identical length using the substr function. In this fashion, the start and finish bases for splicing do not need to be stored beyond their use in the iterative loop.

The snippet of code provided raises a few other questions to my mind, specifically relating to the definition and usage of other variables including $count and $base - Are these variables used elsewhere in your code beyond the bounds of the snippet provided? If not then it appears that these variables are entirely unnecessary in the code snippet provided.

 


In reply to Re: in need of wisdom by rob_au
in thread in need of wisdom: handling DNA strings by lolly

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.