Re^5: Some code optimization

So if the outer loop, 'n' is the only 'big loop', try putting a broadly scoped counter inside each of the 'little loops' (a unique counter for each), and prove to yourself that they're not running away by mistake. ...just one idea. Another is to check how many times you're calling that grep and the sort, and for how many elements. Each of those is an implicit loop.

The idea is to test your theory that the complexity is O(n), and in testing, you may discover that something is not what it seems. I'm not suggesting that your theory is wrong, but that maybe the implementation doesn't match the theory.

If I were hunting down the problem myself, I would be very interested in how many iterations each of the loops is running through. Using a counter variable for each loop, and scoping it to a broad scope so it doesn't reset, and then checking it at the end of the run, may smoke out the mole.

Dave

Comment on Re^5: Some code optimization

Replies are listed 'Best First'.
Re^6: Some code optimization by roibrodo (Sexton) on Jun 18, 2010 at 09:11 UTC
I Added a global counter in the most inner loop in `intersect_legal_ranges`. Counter value at end is, as expected, equal to the number of iterations * n + some small overhead for the ranges that were created as "split" by random. E.g. for 10 iterations and 70,000 genes I got 702,810 (i.e. 70,281 per iteration when n=70,000). Also note that the slowdown occurs even when the second problematic subroutine (`is_contained`) is not called at all (the `next` before it on). The first problematic subroutine (`gene_to_legal_range`) does not use loops at all and is very simple (I hope you agree)	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^6: Some code optimization
by roibrodo (Sexton) on Jun 18, 2010 at 09:11 UTC

I Added a global counter in the most inner loop in intersect_legal_ranges. Counter value at end is, as expected, equal to the number of iterations * n + some small overhead for the ranges that were created as "split" by random. E.g. for 10 iterations and 70,000 genes I got 702,810 (i.e. 70,281 per iteration when n=70,000).

Also note that the slowdown occurs even when the second problematic subroutine (is_contained) is not called at all (the next before it on). The first problematic subroutine (gene_to_legal_range) does not use loops at all and is very simple (I hope you agree)

[reply]
[d/l]
[select]