in reply to Can I speed this up? (repetitively scanning ranges in a large array)

A circle doesn't have a start. The only difference between 1..100 and 0..99 is how you store 100 internally.

You even used zero instead of 10 in your example:

1234567890 coordinate <------- ---------- +++ range (1,3) ++++++ range (2,7) ++ + range (0,2) <------- ---------- 5557753113 size of smallest window...
  • Comment on Re: Can I speed this up? (repetitively scanning ranges in a large array)
  • Download Code

Replies are listed 'Best First'.
Re^2: Can I speed this up? (repetitively scanning ranges in a large array)
by daverave (Scribe) on Nov 02, 2010 at 06:52 UTC
    Obviously. The only reason I'm using coordinates that start from 1, not zero, is to be consistent with the input and output formats I'm using.

    These are biological data, and the unfortunate convention in all biological databases I'm familiar with is to start counting from 1. The first position in any genome is 1. If I used zeros, I would have to remember converting between the two systems each time I input/output and from experience, I quickly forget doing that...

      I would have to remember converting between the two systems each time I input/output and from experience, I quickly forget doing that...

      You say it's obvious, then you keep talking as if a circle has a start.

      No conversion is necessary. Just start at index one of the array for the item labeled 1, then keep going for 100 elements, which is going to end you at element zero of the array.

      for (1..100) { print $result[$_ % 100]; }
        That's possible, but a bit confusing. In all biological databases, a circular genome of length 100 does have coordinate 100. It doesn't have coordinate zero. So, when I print a range that spans to the end of the genome, I must print it as x..100, not x..0, as this is the convention. So this representation might have some benefits, but it also brings the overhead of remembering switch back to 'biological' coordinates when you input/output back...

        Also note not all genomes are circular. What would you do about those? If you put the first coordinate in the first position of the array (arr[0]), you will surely have to -1 anytime you output any coordinate. If you start from arr1 you do what I currently do, but now you treat circular genomes and linear genomes differently anytime you print or even calculate their length (if the genome is circular scalar(@arr) == genome size, but if it's linear scalar(@arr) == genome size + 1). Confusing...

        Anyway, I must admit I'm not sure why are we focusing on this... that's really not the issue. Finally, note that in biology circles do have a start :) For each circular genome, a certain point was selected as '1'. This choice is actually not completely arbitrary - there are some rules for deciding where to call this landmark. Once a genome has been sequenced and published it has one and only '1' and any reference to this genome will be relative to this landmark. Just for general knowledge...