ItsGinny has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I'm using perl for some basic file manipulation for a bioinformatics project.

TLDR: the index in my for loop isn't incrementing. :(

My intent for this program would be to supply it with a .fasta file of genome sequences (its in base pair form, with a header, and they are labeled with a unique accession number which is functionally an id number) and a feature table (.txt). My goal is to trim the complete genome sequences to only the part that corresponds to a certain gene (for me it is rbcL). Luckily, the annotations for where to trim are in the feature table file in the format of a start and end, both stated as a simple value which is the character count in the file at that start/end (example line: rbcL 65478 24534). Before I can even begin to trim the sequences however, I need to search the .fasta file (represented below by the array @arraybl) for the correct accession number (in the list @GBlist), and start from that spot. I can gather all the accession numbers easily, and I have a subroutine that will splice the full file into just the entry that I am working with at that moment, but the counter ($lineIndex) in the for loop (c-style) I am using is not progressing from 0. I don't reset it anywhere, and I believe its format is correct within the for loop header. I am posting the code below (just the loops in question).

for (my $annoIndex = 0; $annoIndex <= $#gbList; $annoIndex++) { my $currGB = $gbList[$annoIndex]; $startmark = 0; print "Current GB is $currGB\n"; for (my $lineIndex = 0; $lineIndex <= $#arraybl; $lineIndex++){ print "Line index is: $lineIndex\n"; my $lineinfile = $arraybl[$lineIndex]; print "line is: $lineinfile\n"; if (index($lineinfile, $currGB) != -1 || $lineinfile =~ m/$cur +rGB/){ $startmark = $lineIndex+1; my @chunk = GetEntryLines($startmark); print "$currGB has been matched!\n"; #print "Chunk Length is $#chunk\n"; #-----------get rbcL & stuff------------ last; } else { next; } } }

GetEntryLines is the subroutine that splices the whole array (the file as lines) into the individual entry. The above loops are the only instances of $lineIndex in the code. Your help is so much appreciated!

Replies are listed 'Best First'.
Re: Index isn't progressing in C-style for loop
by graff (Chancellor) on Dec 20, 2015 at 23:40 UTC
    If the basic problem is that $lineIndex is not incrementing in the inner "for" loop, it must be because the "if" condition inside that loop is always coming up true on the first iteration of that loop, so you just skip to the next iteration of the outer "for" loop (over @gbList items).

    Have you tried stepping through this with the debugger (and including Data::Dumper or related module for inspecting data structures)? I'd put a break point at the "if" statement.

    When execution stops there, check the contents of the relevant variables and array elements to see whether they have unexpected values, or whether your choice of syntax is yielding unexpected results.

Re: Index isn't progressing in C-style for loop
by Apero (Scribe) on Dec 20, 2015 at 23:27 UTC
    the counter ($lineIndex) in the for loop (c-style) I am using is not progressing from 0. I don't reset it anywhere, and I believe its format is correct within the for loop

    The $lineIndex counter is in the inner-loop; are you sure it's not just hitting the conditional and catching the last statement that aborts the loop? This would cause the counter in the inner loop to be 0 the next time it's processed at the next iteration of the outer-loop. Nothing I can see in the provided code should be changing (such as decrementing) the $lineIndex counter.

    Perhaps also try printing out the value of your inner loop counter right before calling next to verify it's still set the same as at the loop entry as well, to verify it's the same as when you started as you expect.

    Another hint for you, you probably don't want both index() and the regex match against m/$currGB/ since that'll treat the contents of $currGB as a regular expression. Stick with either index(), or look into escaping metacharacters using the quotemeta built-in.

    An even more nit-picky style suggestion for you would be to reduce indent level by putting the loop continuation first by reversing the conditional. This is mostly pure style/design preference, but code that needlessly adds to indent level frequently gets harder to maintain in the long-term. Consider:

    if ( index($lineinfile, $currGB) == -1 ) { print "Debugging: continuing the loop\n"; next; } # Do your matching workload here.

    Update: Sans debugging, you can even use the single line:

    next unless ( index($lineinfile, $currGB) != -1 );

Re: Index isn't progressing in C-style for loop
by GrandFather (Saint) on Dec 21, 2015 at 00:12 UTC

    You loops are much better written Perl style:

    for my $annoIndex (0 .. $#gbList) {

    instead of your current C style loops which are verbose, confusing and error prone.

    Premature optimization is the root of all job security
Re: Index isn't progressing in C-style for loop
by johngg (Canon) on Dec 20, 2015 at 23:30 UTC

    Without a little more context and an example of your data it is difficult to make any suggestions. Possibly a silly question but are you certain that @arraybl contains more than one element?

    Cheers,

    JohnGG