in reply to Array of arrays used within object

Is each sequence an object? If so, make addregion() a method, and have it store \@nreg on the object as well.

Otherwise, what about using something like this (pseudocode)...

my %nregs; foreach <$sequences> { local @nreg; do_addregion_for_all_regions(); $nregs{getseqname($_)} = \@nreg; }

This way, you have a set of @nreg, one for each sequence, keyed on the sequence name.

If I am way off base here, no problem. It has been a while since I have done anything biology.

Replies are listed 'Best First'.
Re: Re: Array of arrays used within object
by knirirr (Scribe) on Aug 07, 2003 at 08:43 UTC
    Apologies for the confusion, and thanks for the suggestions. I'll attempt to clarify.

    >I'm not sure I understand exactly what you mean
    >by "convert subroutines into modules".

    >the variable @nreg which has a scope wider
    >than just the subroutine (e.g., a global variable).

    Originally, I had two subroutines, "addregion" and "isitin". The former is as I posted, and the latter looked like this:
    sub isitin { my ($hit,$what) = @_; my $within = 0; my $k; for ($k=0; $k<@nreg; $k++) { if ($what >= $nreg[$k][0] and $what <= $nreg[$k][1]) { $within = 1; } } return $within; }
    ...so that I could get a "1" back if a given base was within any of the regions defined with nreg. As this was all within one script, it was easy to use @nreg globally and re-set it at the top of a "while while (my $seq = $filein->next_seq()) " loop.
    However, I'd like to re-use the code by creating an object for each sequence, that would contain the sequence id and the @nreg array. The hard bit is working out how to set up that object - hence all manner of bless errors as I attempt to comprehend the complexities of OO (not easy for an old shell script writer ;-).

    BTW the code above is lifted directly from the original subroutines - hopefully I can keep the changes minimal and add some extra methods. It came about to solve this problem.

    Essentially, I looped through each sequence, called "addregion" on each feature, then "isitin" on related features later in the loop. @nreg was re-set for the next genome. I can't find any code in the bioperl that I'm using that does this job, hence the attempt to write my own object for it.
      Quick and easy optimization (not considering the other changes you are asking about): Replace
      $within = 1;
      with
      return 1;
      From your coded logic, there is no reason to continue looking through the rest of the array if you have found a match. You can also drop the declaration of $within and end the sub with
      return 0;
      since no match was found if it finishes the loop.

        Thanks for the suggestions - I've incorporated those.

        Presumably by references you mean something like:

        $positions{$file} = \@nreg;

        Then calling subroutines with:

        &msatminer::addregion($positions{$file},$start,$stop);

        If so, that would be a quick solution that might allow me to do one thing the code doesn't currently do (and should) — identify points that are found between two close features (there are various biologically interesting cases).

      In the interests of getting some data, I now have a crude substitute for the ideal code (I'll work on that when deadlines are less pressing). It's like this:

      package msatminer; sub zap { @nreg = (); } # rest of package essentially the same as posted above...

      Thus, calling a “msatminer::zap” before opening each genome file, then running the other methods on the starts and stops, and saving the results before the next “zap” I can at least get valid results out.

      However, if this is somehow very, very naughty in a way I haven't realised, then please tell me ;-)

      Thanks.
        It sounds to me like you understand the problems created by the global variable and the right approach you would like to take in an ideal world--converting the code to an OO design.

        You're just running into the normal problems in trying to figure out how to design and implement objects when you haven't done it a lot before.

        I don't think I can solve the problems you're running into given the information I have at hand, but I would encourage you to take the time to learn Perl OO when you get past the current deadline crunch. Designing the code cleanly from the start helps alleviate a lot of roadblocks you might run into otherwise.

        A simpler step to avoid global variables than converting your code completely to OO, might be to learn about Perl references and pass the big arrays around by reference to your different subroutines.