mdunnbass has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone,

I've never really had a wonderful grasp of scoping, and it is seriously biting my digital ass at the moment. Any help would be greatly appreciated.

I am searching through the elements (all are numeric) of the terminal arrays of a HoHoA and trying to parse them into grouped sets in a new HoAoHoA, based on their proximity to each other. Unfortunately, the output I'm getting involves arrays of identical elements, and missing key/value pairs of hashes. For background info, the numerical array elements are the positions of hits to regex searches performed on ~500 Mb text files containing biological genome data separated into FASTA sequences.

Here's the code that's causing the problems. (Also - the print statements and Dumper are solely for debugging purposes. They will be removed once debugged.

FASTA: for (my $h=0;$h<(@fastarray);$h++) { $setscounter = 0; if (defined %{$matches{$fastarray[$h]}}){ SITE: for $site (sort {$a <=> $b } keys %{$matches{$fastarray[$h]}} +) { if (@{$matches{$fastarray[$h]}{$site}}) { $i = 0; ELEM: while ($i < length(@{$matches{$fastarray[$h]}{$site}})) { $lowerlimit = 0; my $low = $matches{$fastarray[$h]}{$site}[$i]; $lowerlimit = $low + 0; $upperlimit = $span + $lowerlimit; SITEKEY: for $sitekey (sort {$a <=> $b } keys %{$matches{$fasta +rray[$h]}}) { if (@{$matches{$fastarray[$h]}{$sitekey}}) { my @arrayA = (); my $hit = 0; SET: while ($hit < length(@{$matches{$fastarray[$h]}{$siteke +y}} )) { print "...in \$matches{$fastarray[$h]}{$sitekey}[$hit]\n"; if ($matches{$fastarray[$h]}{$sitekey}[$hit] >= $lowerlimi +t && $matches{$fastarray[$h]}{$sitekey}[$hit] <= $upperlimit) { push (@arrayA, $matches{$fastarray[$h]}{$sitekey}[$hit]) +; $hit++; next SET; #closes If setcount } elsif ($matches{$fastarray[$h]}{$sitekey}[$hit] < $lower +limit) { $hit++; next SET; } else {$hit++;} #closes while hit } if (@arrayA && $hit == length(@{$matches{$fastarray[$h]}{$si +tekey}})) { $sets{$fastarray[$h]}[$setscounter]{$sitekey} = \@arrayA; print "\$sets{$fastarray[$h]}[$setscounter] is:\n"; print Dumper(%{$sets{$fastarray[$h]}[$setscounter]}); @arrayA = (); } else { $sets{$fastarray[$h]}[$setscounter]{$sitekey} = undef; } $hit = 0; next SITEKEY; #closes if matches } next SITEKEY; #closes SITEKEY } $setscounter++; $i++; next ELEM; # closes while i } #closes if matches } next SITE; #closes for my site } #closes if defined } next FASTA; #closes for fastarray }
The meat of the routine occurs within and just after the SET label, where I push the element [$hit] into @arrayA. Ideally, this should place into an array all elements $hit in the array that is the value to the $sitekey key that are >= $lowerlimit && <=$upperlimit, but for every time there's multiple elements of @arrayA, they are stored as identical, so something is wrong, and I assume it's a scoping problem, tho' I could be wrong.

Here's some of the output, slightly annotated to explain what I really want:

...in $matches{>scaffold_446}{0}[1] $sets{>scaffold_446}[1] is: $VAR1 = '0'; $VAR2 = [ 0, # These 2 array elements should not BOTH 0 # be 0!!! What did I do wrong? ]; ...in $matches{>scaffold_446}{1}[0] ...in $matches{>scaffold_446}{2}[0] ...in $matches{>scaffold_446}{0}[0] ...in $matches{>scaffold_446}{0}[1] ...in $matches{>scaffold_446}{1}[0] $sets{>scaffold_446}[2] is: $VAR1 = '1'; # For each of these different '...in $matches..." $VAR2 = [ # There should be 6 $VAR variables. corresponding 40048 # to 3 key/value pairs, but I am doing something w +rong ]; # in my code, such that sometimes there are 3 + pairs, but $VAR3 = '0'; # often not. Any thoughts? $VAR4 = undef; ...in $matches{>scaffold_446}{2}[0] ...in $matches{>scaffold_446}{0}[0] ...in $matches{>scaffold_446}{0}[1] ...in $matches{>scaffold_446}{1}[0] ...in $matches{>scaffold_446}{2}[0] $sets{>scaffold_446}[3] is: $VAR1 = '1'; $VAR2 = undef; $VAR3 = '0'; $VAR4 = undef; $VAR5 = '2'; $VAR6 = [ 5468 ]; ...in $matches{>scaffold_3198}{0}[0] $sets{>scaffold_3198}[0] is: $VAR1 = '0'; $VAR2 = [ 1829 ];
Any ideas, thoughts, or help are greatly welcomed and appreciated. Anyone who needs more info to help, please just ask me, and I tell you ev'ryt'ing.

Thanks,
Matt

Replies are listed 'Best First'.
Re: Scoping problems in nested loops
by GrandFather (Saint) on Nov 14, 2006 at 20:52 UTC

    length(@{$matches{$element}{$site}} looks rather surprising to me. Consider:

    my @array = ('bannana', 'apple', 'orange'); print length (@array);

    Prints '1'. Most likely what you really wanted was the number of elements in the array. That is simple @array in a scalar context. However is seems like you really want to iterate over the array so a beter construct would be:

    for my $low (@{$matches{$element}{$site}}) {

    and omit the following line initialising $low. There are a number of places where the same thing seems to have been done.

    Other changes I'd make involve early exits from loops rather than nesting inside if statements, removing duplicated code, removing superfluious nexts, using Perl for loops rather than C for loops and adding a little vertical whitespace to make flow clearer. At the end of that process I get:

    my @fastarray; my %matches; my %sets; my $span; for my $element (@fastarray) { my $setscounter = 0; next unless defined %{$matches{$element}}; for my $site (sort {$a <=> $b } keys %{$matches{$element}}) { next unless @{$matches{$element}{$site}}; for my $low (@{$matches{$element}{$site}}) { my $lowerlimit = $low + 0; my $upperlimit = $span + $lowerlimit; for my $sitekey (sort {$a <=> $b } keys %{$matches{$elemen +t}}) { next unless @{$matches{$element}{$sitekey}}; my @arrayA = (); for my $hElem (@{$matches{$element}{$sitekey}}) { print "...in \$hElem\n"; if ($hElem >= $lowerlimit && $hElem <= $upperlimit +) { push (@arrayA, $hElem); } } if (@arrayA) { $sets{$element}[$setscounter]{$sitekey} = \@arrayA +; print "\$sets{$element}[$setscounter] is:\n"; print Dumper(%{$sets{$element}[$setscounter]}); @arrayA = (); } else { $sets{$element}[$setscounter]{$sitekey} = undef; } } $setscounter++; } } }

    which may or may not be nothing at all like what you intended ;).


    DWIM is Perl's answer to Gödel
      Thanks. That looks really great.

      I've been playing with it, and I did hit one snag so far. The arrays sotred in @{$matches{$element}{$sitekey}} contain possibly millions of elements, all stored in numerical order. If we assume that the ones being pushed into @arrayA are going to number a dozen or less, it seems really wasteful to me to sift through the entire array, for instance, once you've already reached values >= $upperlimit.

      This is why I had included some of the early loop exits and so on. It will (presumaby) greatly increase the speed of the program, which is highly desirable. What are your thoughts on modifying your code here:

      for my $hElem (@{$matches{$element}{$sitekey}}) { print "...in \$hElem\n"; if ($hElem >= $lowerlimit && $hElem <= $upperlimit) { push (@arrayA, $hElem); } }
      To something along these lines instead? :

      for my $hElem (@{$matches{$element}{$sitekey}}) { next unless ($hElem >= $lowerlimit); break unless ($hElem <= $upperlimit); print "...in \$hElem\n" push (@arrayA, $hElem); }
      Thanks,
      Matt

        The loop with early exit looks fine. However if there are large numbers of elements and the range represents a small fraction of the total number of elements, then I'd be inclined to do a binary search for the two end elements in the range and use an array slice to copy the elements in the range out. You may find Binary search helps as a starting point for the search code.

        Given two end element indexes the following does the slice and copy:

        @arrayA = @{$matches{$element}{$sitekey}}[$first .. $last];

        DWIM is Perl's answer to Gödel
Re: Scoping problems in nested loops
by jwkrahn (Abbot) on Nov 14, 2006 at 21:51 UTC
    You have a big glaring bug here:
    if (@arrayA && $hit == length(@{$matches{$fastarray[$h]}{$si +tekey}})) { $sets{$fastarray[$h]}[$setscounter]{$sitekey} = \@arrayA; print "\$sets{$fastarray[$h]}[$setscounter] is:\n"; print Dumper(%{$sets{$fastarray[$h]}[$setscounter]}); @arrayA = (); }
    You are assigning a reference to @arrayA and then clearing out the contents of @arrayA so it is the same as assigning an empty anonymous array in the first place. Remove the line: "@arrayA = ();". Anyway, it looks like your code could be simplified to:
    for my $fastarray ( @fastarray ) { my $hash = $matches{ $fastarray }; for my $site ( sort { $a <=> $b } keys %$hash ) { for my $lowerlimit ( @{ $hash->{ $site } } ) { my $upperlimit = $span + $lowerlimit; for my $sitekey ( sort { $a <=> $b } keys %$hash ) { push @{ $sets{ $fastarray } }, { $sitekey => undef }; my @arrayA = grep { $_ >= $lowerlimit && $_ <= $upperl +imit } @{ $hash->{ $sitekey } } or next; $sets{ $fastarray }[ -1 ]{ $sitekey } = \@arrayA; print "\$sets{$fastarray}[-1] is:\n", Dumper $sets{ $f +astarray }[ -1 ]; } } }

Re: Scoping problems in nested loops
by liverpole (Monsignor) on Nov 14, 2006 at 20:22 UTC
    Hi mdunnbass,

    First of all, before you do anything else, put the following 2 lines at the top of your program:

    use strict; use warnings;

    Now run your program, and get the output:

    Global symbol "@fastarray" requires explicit package name at x line 5. Global symbol "$setscounter" requires explicit package name at x line +6. Global symbol "%matches" requires explicit package name at x line 7. Global symbol "@fastarray" requires explicit package name at x line 7. Global symbol "$site" requires explicit package name at x line 8. Global symbol "%matches" requires explicit package name at x line 8. Global symbol "@fastarray" requires explicit package name at x line 8. Global symbol "%matches" requires explicit package name at x line 9. Global symbol "@fastarray" requires explicit package name at x line 9. Global symbol "$site" requires explicit package name at x line 9. Global symbol "$i" requires explicit package name at x line 10. Global symbol "$i" requires explicit package name at x line 11. Global symbol "%matches" requires explicit package name at x line 11. Global symbol "@fastarray" requires explicit package name at x line 11 +. Global symbol "$site" requires explicit package name at x line 11. Global symbol "$lowerlimit" requires explicit package name at x line 1 +2. Global symbol "%matches" requires explicit package name at x line 13. Global symbol "@fastarray" requires explicit package name at x line 13 +. Global symbol "$site" requires explicit package name at x line 13. # ... etc. ...

    Now go and fix those problems; declare each symbol in the outermost scope that they are needed.  For example:

    my @fastarray; my @fastarray; my $setscounter; my %matches; my $site; my %sets; + FASTA: for (my $h=0;$h<(@fastarray);$h++) {

    You shouldn't ever have to declare the same variable twice (otherwise, it's like using a whole different variable).

    That should give you some clues about where your problem is.

    Update:  added more my declarations at the outer-most scope; the other variables can all be declared where they are first used (eg. my $upperlimit = my $span + $lowerlimit;).

    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
      Hi. I guess I should have mentioned this in my original post. The section of code that I pasted in here is actually a subset of a subroutine of a much larger program. I am using use strict and use warnings.

      The code itself functions and does not give any warnings (even when I run perl -w). It's just that it's not running the way I want it to.

      I know I should probably run it through perl -d, but considering how far down the pipe in the program this section actually occurs, It'd take me 20 minutes just to get to this point.

      Thanks though, for the thought.
      Matt