in reply to Re^3: A question About Array Indexing
in thread A question About Array Indexing

This is great advice, and makes the process much more time and memory efficient! The one follow up question I have is pertaining to the line

while( <INTERVAL> ) { my( $start, $end ) = split "\t", $_;

The text file being read has 3 columns, the first of which is non-numeric. If I specify the variables with

foreach my $interval (<INTERVAL>){ my @find_interval = split(/\t/, $interval); my $start = $find_interval[1]; my $end = $find_interval[2];

would that accomplish the same thing?

Replies are listed 'Best First'.
Re^5: A question About Array Indexing
by AnomalousMonk (Archbishop) on Aug 27, 2013 at 02:00 UTC
    foreach my $interval (<INTERVAL>){ my @find_interval = split(/\t/, $interval); my $start = $find_interval[1]; my $end = $find_interval[2];

    Be aware that this loop will read the entire file accessed by the  INTERVAL filehandle into memory at once as an array, each line of the file being an array element. The
        while( <INTERVAL> ) { ... }
    loop reads and processes a line at a time: much more scalable, insignificant speed difference, if any.

    my @find_interval = split(/\t/, $interval);

    I would split directly into the named variables you will be using, and split on  '\s' (whitespace) to avoid having a newline stuck to the end of the third field element:
        my (undef, $start, $end) = split '\s', $_;

Re^5: A question About Array Indexing
by bioinformatics (Friar) on Aug 27, 2013 at 01:29 UTC

    It would accomplish the same thing, yes. It may be more readable to use the former syntax if the array being split isn't huge however. So you are masking regions of the genome I take it?

    Bioinformatics

      Yes indeed, just one chromosome for this program. The intervals are generally small but numerous in number since small interspersed sequences have been filtered out. 124467 intervals actually. It appears it will take a bit of time to go through all of them but I can't even imagine doing it the way I first proposed. Thank you for the help!

Re^5: A question About Array Indexing
by BrowserUk (Patriarch) on Aug 27, 2013 at 08:25 UTC
    The text file being read has 3 columns, the first of which is non-numeric. If I specify the variables with foreach ... would that accomplish the same thing?

    Sorry, my mistake. However, I'd stick with while rather than foreach. There is simply no benefit to filling memory with an entire file (however big) if you can only use 1 line at a time:

    while( <INTERVAL> ) { ## ignore the first field on each line my( undef, $start, $end ) = split "\t", $_; ...

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.