Array of arrays used within object

knirirr has asked for the wisdom of the Perl Monks concerning the following question:

Whilst attempting to convert some subroutines into module, I've run into difficulty with one particular feature. @nreg in the code below is used to store the start and end locations of features in a nucleotide sequence, so that another method may be used to determine whether a given base is within any of the defined regions.

sub addregion 
{
    my ($start,$end) = @_;
    if ($start > $end)
    {
        ($start,$end) = ($end,$start);
    }
    my $done = 0;
    my $k;
    my $tempcoding = 0;

    for ($k=0;$k<@nreg;$k++)
    {
        # calculate sum of bases
        $tempcoding += $nreg[$k][1] - $nreg[$k][0] + 1;

        # new region starts after old region ends
        next if ($nreg[$k][1] < $start);

        # new region ends before old region starts
        if ($end < $nreg[$k][0])
        {
            splice(@nreg,$k,0,[$start,$end]);
            $done = 1;
            last;
        }

        # regions overlap
        $nreg[$k][0] = $start < $nreg[$k][0] ? $start : $nreg[$k][0];
        $nreg[$k][1] = $end > $nreg[$k][1] ? $end : $nreg[$k][1];
        $done = 1;

        # Combine two regions if the new section has brough about an o
+verlap
        while ($k < $#nreg and $nreg[$k][1] > $nreg[$k+1][0])
        {
            $nreg[$k][1] = $nreg[$k+1][1];
            splice(@nreg,$k+1,1);
        }
        last;
    }
    unless ($done)
    {
        push @nreg,[$start,$end];
    }
}
[download]

Several (hundreds, even) sequences may be read, and it's apparently necessary to define a new object relating @nreg to the name of each sequence, as I find that the values for all sequences are pushed into the same array. Can anyone suggest a suitable way to set up a "new" method to relate a different @nreg to a particular sequence name, please? I've tried various permutations, which either use the same array every time or refuse to bless it (e.g. "Attempt to bless into a reference at..." messages appear). Thanks for any suggestions!

Comment on Array of arrays used within object Download Code

Replies are listed 'Best First'.
Re: Array of arrays used within object by esh (Pilgrim) on Aug 06, 2003 at 16:56 UTC
I'm not sure I understand exactly what you mean by "convert subroutines into modules". It looks like one problem you have is the fact that the subroutine "addregion" is accessing the variable @nreg which has a scope wider than just the subroutine (e.g., a global variable). You might change the code to pass this array in, probably by reference. `sub addregion { my ($start,$end,$nreg) = @_; : for ($k=0;$k<@$nreg;$k++) :` [download] You ask for a way to "relate a different @nreg to a particular sequence name". This might be done by turning @nreg from an array into a hash of arrays, where the hash key is the sequence name. `sub addregion { my ($start,$end,$sequence) = @_; : for ($k=0;$k<@{$nreg{$sequence}};$k++) : $tempcoding += $nreg{$sequence}[$k][1] - $nreg{$sequence}[$k][0] + 1;` [download] My primary suggestion would be to try simplify the subroutine so that you pass in all parameters that it uses to operate. This reduces the external interactions and side effects and makes the entire code base simpler to deal with when you have to make changes. It's a bit difficult to offer suggestions without seeing the use of nreg in the rest of the code, and the "bless" error doesn't seem to relate to any of the code presented so that confuses me. Is the code sample you presented the original code or the code after you have started the conversion? Hmmm... After reading the post again, I'm wondering if maybe @nreg has one entry per sequence already, and you're trying to invert the structure of the data to have each sequence be a top leve object with the sub-array of `@nreg[$k]` be stored in the object.	[reply] [d/l] [select]
Re: Array of arrays used within object by MidLifeXis (Monsignor) on Aug 06, 2003 at 17:08 UTC
Is each sequence an object? If so, make `addregion()` a method, and have it store \@nreg on the object as well. Otherwise, what about using something like this (pseudocode)... `my %nregs; foreach <$sequences> { local @nreg; do_addregion_for_all_regions(); $nregs{getseqname($_)} = \@nreg; }` [download] This way, you have a set of @nreg, one for each sequence, keyed on the sequence name. If I am way off base here, no problem. It has been a while since I have done anything biology.	[reply] [d/l] [select]
Re: Re: Array of arrays used within object by knirirr (Scribe) on Aug 07, 2003 at 08:43 UTC
Apologies for the confusion, and thanks for the suggestions. I'll attempt to clarify. >I'm not sure I understand exactly what you mean >by "convert subroutines into modules". >the variable @nreg which has a scope wider >than just the subroutine (e.g., a global variable). Originally, I had two subroutines, "addregion" and "isitin". The former is as I posted, and the latter looked like this: `sub isitin { my ($hit,$what) = @_; my $within = 0; my $k; for ($k=0; $k<@nreg; $k++) { if ($what >= $nreg[$k][0] and $what <= $nreg[$k][1]) { $within = 1; } } return $within; }` [download] ...so that I could get a "1" back if a given base was within any of the regions defined with nreg. As this was all within one script, it was easy to use @nreg globally and re-set it at the top of a "while while (my $seq = $filein->next_seq()) " loop. However, I'd like to re-use the code by creating an object for each sequence, that would contain the sequence id and the @nreg array. The hard bit is working out how to set up that object - hence all manner of bless errors as I attempt to comprehend the complexities of OO (not easy for an old shell script writer ;-). BTW the code above is lifted directly from the original subroutines - hopefully I can keep the changes minimal and add some extra methods. It came about to solve this problem. Essentially, I looped through each sequence, called "addregion" on each feature, then "isitin" on related features later in the loop. @nreg was re-set for the next genome. I can't find any code in the bioperl that I'm using that does this job, hence the attempt to write my own object for it.	[reply] [d/l]
Re: Re: Re: Array of arrays used within object by esh (Pilgrim) on Aug 08, 2003 at 01:26 UTC
Quick and easy optimization (not considering the other changes you are asking about): Replace `$within = 1;` [download] with `return 1;` [download] From your coded logic, there is no reason to continue looking through the rest of the array if you have found a match. You can also drop the declaration of $within and end the sub with `return 0;` [download] since no match was found if it finishes the loop.	[reply] [d/l] [select]
Re: Re: Re: Re: Array of arrays used within object by knirirr (Scribe) on Aug 08, 2003 at 09:29 UTC
Re: Re: Re: Array of arrays used within object - crude substitute by knirirr (Scribe) on Aug 07, 2003 at 15:44 UTC
In the interests of getting some data, I now have a crude substitute for the ideal code (I'll work on that when deadlines are less pressing). It's like this: `package msatminer; sub zap { @nreg = (); } # rest of package essentially the same as posted above...` [download] Thus, calling a “msatminer::zap” before opening each genome file, then running the other methods on the starts and stops, and saving the results before the next “zap” I can at least get valid results out. However, if this is somehow very, very naughty in a way I haven't realised, then please tell me ;-) Thanks.	[reply] [d/l]
Re: Re: Re: Re: Array of arrays used within object - crude substitute by esh (Pilgrim) on Aug 08, 2003 at 01:36 UTC